## Invasive Plant Data of Kentucky (Based on Data from 2013)

In [1]:
# Source: https://www.invasiveplantatlas.org/list.html?id=24

#----------------------------------------------------
# Goals
#----------------------------------------------------
# . Read in data from two source types-
# 1. The first df (df1, from a csv), lists invasive plant species by both common and scientific names as well as nativity.
# 2. The second df (df2, also from a csv), has the same plants but lists an additional column of plant invasiveness. 
# . Combine the two dfs to create one main df (df_main).
# . Use matplotlib/seaorn to visually display plant stats.
# . Create a dictionary & function so a user can enter a plant name and see the details for that specific plant.

#----------------------------------------------------
# Features
#----------------------------------------------------
# . Read data from an external file, such as text, JSON, CSV, etc and use that data in your application.
# . Visualize data in a graph, chart, or other visual representation of data
# . Create and call at least 3 functions or methods, at least one of which must return a value that is used 
# somewhere else in your code. To clarify, at least one function should be called in your code, that function 
# should calculate, retrieve, or otherwise set the value of a variable or data structure, return a value to 
# where it was called, and use that value somewhere else in your code.

#----------------------------------------------------
# Additional Features
#----------------------------------------------------
# . Use pandas, matplotlib, and/or numpy to perform a data analysis project. Ingest 2 or more pieces of data, 
# analyze that data in some manner, and display a new result to a graph, chart, or other display

In [1]:
# Imports 'pandas' into the script.
import pandas as pd

In [2]:
# Displays every row for all dfs in the notebook.

# pd.set_option('display.max_rows', 176)

In [2]:
# Reads in data from a local csv file. Then creates a data frame(df), which is assigned to 'df1'.
df1 = pd.read_csv('data/Invasive_Plants_List - Sheet1.csv', encoding='utf8')

# Displays the first 5 rows of the df to see what I'm working with.
df1.head()

Unnamed: 0,Scientific Name,Subject Name,Family,U.S. Nativity
0,Acer platanoides L.,Norway maple,Aceraceae,Exotic
1,Achyranthes japonica (Miq.) Nakai,Japanese chaff flower,Amaranthaceae,Exotic
2,Agrostis stolonifera L.,creeping bentgrass,Poaceae,Exotic
3,Ailanthus altissima (P. Mill.) Swingle,tree-of-heaven,Simaroubaceae,Exotic
4,Akebia quinata (Houtt.) Dcne.,chocolate vine,Lardizabalaceae,Exotic


In [10]:
#----------------------------------------------------
# Cleaning and manipulating for df1. 
#----------------------------------------------------

# This changes the col name from 'Subject Name' to 'Common Name'.
# The reason I did this is it's my understanding this will make it easier to merge the two dfs later.
df1.rename(columns={
    'Subject Name':'Common Name'},
          inplace=True)

# Displays the first 5 rows of the df after it's been cleaned. 
df1.head().to_dict('list')

{'Scientific Name': ['Acer platanoides L.',
  'Achyranthes japonica (Miq.) Nakai',
  'Agrostis stolonifera L.',
  'Ailanthus altissima (P. Mill.) Swingle',
  'Akebia quinata (Houtt.) Dcne.'],
 'Common Name': ['Norway maple',
  'Japanese chaff flower',
  'creeping bentgrass',
  'tree-of-heaven',
  'chocolate vine'],
 'Family': ['Aceraceae',
  'Amaranthaceae',
  'Poaceae',
  'Simaroubaceae',
  'Lardizabalaceae'],
 'U.S. Nativity': ['Exotic', 'Exotic', 'Exotic', 'Exotic', 'Exotic']}

In [4]:
# Reads in df2 which is saved in the 'data' folder of Invasive_Plants.
df2 = pd.read_csv('data/Invasive_Plants_Threat_Levels - Sheet1.csv', encoding='utf8')


# Again, displaying the first 5 rows of the df to get a look at the data.
df2.head()

Unnamed: 0,Scientific Name,Common Name,Threat Level
0,Achyranthes japonica,Japanese chaff flower,1
1,Ailanthus altissima,tree-of-heaven,1
2,Alliaria petiolata,garlic mustard,1
3,Ampelopsis brevipedunculata,porcelain berry,1
4,Arthraxon hispidus,hairy jointgrass,1


In [9]:
#----------------------------------------------------
# Cleaning and manipulating for df2.
#----------------------------------------------------

# This will alphabetize the 'Scientific Names' col. Again, this should help with merging the two dfs later.
df2.sort_values(by=['Scientific Name'])

# # df1.col1= df1.col1.astype(str)
# # df2.col2 = df2.col2.astype(str)

# df1.col1 = df1.col1.str.encode('utf-8')
# df2.col2 = df2.col2.str.encode('utf-8')

# Again, just making sure the change took place with the first few rows.
df2.head().to_dict('list')

{'Scientific Name': ['Achyranthes japonica',
  'Ailanthus altissima',
  'Alliaria petiolata',
  'Ampelopsis brevipedunculata',
  'Arthraxon hispidus'],
 'Common Name': ['Japanese chaff flower',
  'tree-of-heaven',
  'garlic mustard',
  'porcelain berry',
  'hairy jointgrass'],
 'Threat Level': [1, 1, 1, 1, 1]}

In [11]:
#----------------------------------------------------
# Merging/joining the dfs to make one main df.
#----------------------------------------------------

# Merge on a specific col.
main_df = pd.merge(df1, df2)
main_df

Unnamed: 0,Scientific Name,Common Name,Family,U.S. Nativity,Threat Level


In [21]:
#----------------------------------------------------
# Cleaning and manipulaing the main_df.
#----------------------------------------------------

# Seperate plant family groups and pick the 3 largest groups. (Use pandas collab book for ex.)
# Use this info for visualizations.

# This line looks at the 'Family' col within the main_df. 
# main_df['Family'].unique()

# Creates a new variable which holds the values of the 'Family' col.
# fam_group = main_df['Famly'].value_counts()

# Selects the top 5 largest groups.
# fam_group.nlargest(5).index

# Selects the data from the top 5 family groups.
# main_df = data[data['Family'].isin(fam_group.nlargest(5).index)].copy()

In [4]:
# Imports 'seaborn' into the script.
import seaborn as sns

In [5]:
#----------------------------------------------------
# TEST
#----------------------------------------------------

#From the seaborn docs, used to make sure seaborn works as it's supposed to within this notebook. 
# df = sns.load_dataset("penguins")
# sns.pairplot(df, hue="species")

In [None]:
# Return visualizations for data. 
# Main one showing an overview of all plants from least < most invasive species.
# Threat levels on x-axis, numbers in groups of 10, on y-axis.
# Second one being family groups by number of species within each family.
# Family names on x-axis, numbers in groups of 10, on y-axis.
# Third, and maybe most impressive will be a bar chart showing nativ vs. exotic species.
# Plot graph in shades of greens, yellow, orange, and red. Green being on the watch list, red being severe.

In [None]:
#----------------------------------------------------
# Dictionary
#----------------------------------------------------

# Use data to create a dict so user can enter specific info and find plant invasiness, family groups, etc.

import Invasive_Plant_Dict
break

#### Possible Future Features

In [5]:
# Create a method so a user can view the visualization for all plants at one type of threat level.
# Gather more data to show growth rates over time. (Fingers crossed on that one.)
# Gather more data to find the top 5, 10 & 50 most invasive species.