## Invasive Plant Data of Kentucky (Based on Data from 2013)

In [1]:
# Source: https://www.invasiveplantatlas.org/list.html?id=24

#----------------------------------------------------
# Goals
#----------------------------------------------------
# . Read in data from two source types-
# 1. The first df (df1, from a csv), lists invasive plant species by both common and scientific names as well as nativity.
# 2. The second df (df2, also from a csv), has the same plants but lists an additional column of plant invasiveness. 
# . Combine the two dfs to create one main df (df_main).
# . Use matplotlib/seaorn to visually display plant stats.
# . Create a dictionary & function so a user can enter a plant name and see the details for that specific plant.

#----------------------------------------------------
# Features
#----------------------------------------------------
# . Read data from an external file, such as text, JSON, CSV, etc and use that data in your application.
# . Visualize data in a graph, chart, or other visual representation of data
# . Create and call at least 3 functions or methods, at least one of which must return a value that is used 
# somewhere else in your code. To clarify, at least one function should be called in your code, that function 
# should calculate, retrieve, or otherwise set the value of a variable or data structure, return a value to 
# where it was called, and use that value somewhere else in your code.

#----------------------------------------------------
# Additional Features
#----------------------------------------------------
# . Use pandas, matplotlib, and/or numpy to perform a data analysis project. Ingest 2 or more pieces of data, 
# analyze that data in some manner, and display a new result to a graph, chart, or other display

In [2]:
# Imports 'pandas' into the script.
import pandas as pd

In [3]:
# Reads in data from a local csv file. Then creates a data frame(df), which is then assigned to 'df1'.
df1 = pd.read_csv('data/Invasive_Plants_List - Sheet1.csv', encoding='utf8')

# Displays the first 5 rows of the df to see what I'm working with.
df1.head()

Unnamed: 0,Scientific_Name,Subject_Name,Family,US_Nativity
0,Acer platanoides L.,Norway maple,Aceraceae,Exotic
1,Achyranthes japonica (Miq.) Nakai,Japanese chaff flower,Amaranthaceae,Exotic
2,Agrostis stolonifera L.,creeping bentgrass,Poaceae,Exotic
3,Ailanthus altissima (P. Mill.) Swingle,tree-of-heaven,Simaroubaceae,Exotic
4,Akebia quinata (Houtt.) Dcne.,chocolate vine,Lardizabalaceae,Exotic


In [4]:
#----------------------------------------------------
# Cleaning and manipulating for df1. 
#----------------------------------------------------

# This changes the col name from 'Subject Name' to 'Common Name'.
# The reason I did this is it's my understanding this will make it easier to merge the two dfs later.
df1.rename(columns={
    'Subject_Name':'Common_Name'},
          inplace=True)

# Displays the first 5 rows of the df after it's been cleaned. 
df1.head()

Unnamed: 0,Scientific_Name,Common_Name,Family,US_Nativity
0,Acer platanoides L.,Norway maple,Aceraceae,Exotic
1,Achyranthes japonica (Miq.) Nakai,Japanese chaff flower,Amaranthaceae,Exotic
2,Agrostis stolonifera L.,creeping bentgrass,Poaceae,Exotic
3,Ailanthus altissima (P. Mill.) Swingle,tree-of-heaven,Simaroubaceae,Exotic
4,Akebia quinata (Houtt.) Dcne.,chocolate vine,Lardizabalaceae,Exotic


In [5]:
# Reads in df2 which is saved in the 'data' folder of Invasive_Plants.
df2 = pd.read_csv('data/Invasive_Plants_Threat_Levels - Sheet1.csv', encoding='utf8')


# Again, displaying the first 5 rows of the df to get a look at the data.
df2.head()

Unnamed: 0,Scientific_Name,Common_Name,Threat_Level
0,Achyranthes japonica,Japanese chaff flower,1
1,Ailanthus altissima,tree-of-heaven,1
2,Alliaria petiolata,garlic mustard,1
3,Ampelopsis brevipedunculata,porcelain berry,1
4,Arthraxon hispidus,hairy jointgrass,1


In [6]:
#----------------------------------------------------
# Cleaning and manipulating for df2.
#----------------------------------------------------

# This will alphabetize the 'Scientific Names' col. Again, this might help with merging the two dfs later.
df2.sort_values(by=['Scientific_Name'])

# Again, just making sure the change took place with the first few rows.
df2.head()

Unnamed: 0,Scientific_Name,Common_Name,Threat_Level
0,Achyranthes japonica,Japanese chaff flower,1
1,Ailanthus altissima,tree-of-heaven,1
2,Alliaria petiolata,garlic mustard,1
3,Ampelopsis brevipedunculata,porcelain berry,1
4,Arthraxon hispidus,hairy jointgrass,1


In [8]:
#----------------------------------------------------
# Merging/joining the dfs to make main df.
#----------------------------------------------------

##NOTE##
# df1 nunique = 176
# df2 nunique = 163

# Here df1 is being combined with df2 and filling in all the missing values with NaN.
combo_df = df1.combine_first(df2)

combo_df

Unnamed: 0,Common_Name,Family,Scientific_Name,Threat_Level,US_Nativity
0,Norway maple,Aceraceae,Acer platanoides L.,1.0,Exotic
1,Japanese chaff flower,Amaranthaceae,Achyranthes japonica (Miq.) Nakai,1.0,Exotic
2,creeping bentgrass,Poaceae,Agrostis stolonifera L.,1.0,Exotic
3,tree-of-heaven,Simaroubaceae,Ailanthus altissima (P. Mill.) Swingle,1.0,Exotic
4,chocolate vine,Lardizabalaceae,Akebia quinata (Houtt.) Dcne.,1.0,Exotic
...,...,...,...,...,...
171,common periwinkle,Apocynaceae,Vinca minor L.,,Exotic
172,Japanese wisteria,Fabaceae (Leguminosae),Wisteria floribunda (Willd.) DC.,,Exotic
173,Chinese wisteria,Fabaceae (Leguminosae),Wisteria sinensis (Sims) DC.,,Exotic
174,Wisteria floribunda x sinensis hybrid,Fabaceae (Leguminosae),Wisteria x formosa [floribunda x sinensis],,Exotic


In [12]:
#----------------------------------------------------
# Cleaning and manipulaing the combo_df.
#----------------------------------------------------

# Combining the two 'Common_Name' cols that were previously returned after combining the two dfs.
combo_df.groupby(combo_df.Common_Name, axis=1).sum()

# Display all the rows to see what's now been filled with NaN.
# These rows and any duplicated date will be removed later. 
pd.set_option('display.max_rows', 176)

# Drop all rows that include NaN that line up within 'Threat_Level' col.
combo_df.dropna(subset=['Threat_Level'], inplace=True)

# Change the floats in the 'Threat_Level' col to integers.

# Lists how many species are in each family. 
top_invasive_family = combo_df['Family'].value_counts()
top_invasive_family

Poaceae                   31
Fabaceae (Leguminosae)    15
Asteraceae                13
Rosaceae                   9
Lamiaceae                  9
Polygonaceae               6
Brassicaceae               6
Apiaceae                   5
Caprifoliaceae             5
Berberidaceae              3
Convolvulaceae             3
Ranunculaceae              3
Celastraceae               3
Moraceae                   3
Elaeagnaceae               2
Amaryllidaceae             2
Sapindaceae                2
Hydrocharitaceae           2
Scrophulariaceae           2
Caryophyllaceae            2
Rhamnaceae                 2
Dipsacaceae                2
Amaranthaceae              2
Najadaceae                 1
Araliaceae                 1
Aceraceae                  1
Primulaceae                1
Iridaceae                  1
Lardizabalaceae            1
Hydrangeaceae              1
Cymbellaceae               1
Clusiaceae                 1
Oleaceae                   1
Oxalidaceae                1
Euphorbiaceae 

In [13]:
# Shows just the top 5 invasive family groups. 
top_invasive_family.nlargest(5).index

Index(['Poaceae', 'Fabaceae (Leguminosae)', 'Asteraceae', 'Rosaceae',
       'Lamiaceae'],
      dtype='object')

In [16]:
# Now display the data for the top 5 invasive plants.
top_family = combo_df[combo_df['Family'].isin(top_invasive_family.nlargest(5).index)].copy()
top_family

Unnamed: 0,Common_Name,Family,Scientific_Name,Threat_Level,US_Nativity
2,creeping bentgrass,Poaceae,Agrostis stolonifera L.,1.0,Exotic
5,mimosa,Fabaceae (Leguminosae),Albizia julibrissin Durazz.,1.0,Exotic
12,common burdock,Asteraceae,Arctium minus Bernh.,1.0,Exotic
14,mugwort,Asteraceae,Artemisia vulgaris L.,1.0,Exotic
15,"small carpetgrass, joint-head grass",Poaceae,Arthraxon hispidus (Thunb.) Makino,1.0,Exotic
16,giant reed,Poaceae,Arundo donax L.,1.0,Exotic
19,field brome,Poaceae,Bromus arvensis L.,1.0,Exotic
20,rescuegrass,Poaceae,Bromus catharticus Vahl,1.0,Exotic
21,soft brome,Poaceae,Bromus hordeaceus L.,1.0,Exotic
22,smooth brome,Poaceae,Bromus inermis Leyss.,1.0,Exotic


In [9]:
#----------------------------------------------------
# Other To-Dos
#----------------------------------------------------

# After cleaning the 'combo_df' create new CSV of cleaned data.

# Seperate plant family groups and pick the 3 largest groups. (Use pandas collab book for ex.)
# Use this info for visualizations.

# This line looks at the 'Family' col within the main_df. 
# main_df['Family'].unique()

# Creates a new variable which holds the values of the 'Family' col.
# fam_group = main_df['Famly'].value_counts()

# Selects the top 5 largest groups.
# fam_group.nlargest(5).index

# Selects the data from the top 5 family groups.
# main_df = data[data['Family'].isin(fam_group.nlargest(5).index)].copy()

In [10]:
# Imports 'seaborn' into the script.
import seaborn as sns

In [11]:
#----------------------------------------------------
# TEST
#----------------------------------------------------

#From the seaborn docs, used to make sure seaborn works as it's supposed to within this notebook. 
# df = sns.load_dataset("penguins")
# sns.pairplot(df, hue="species")

In [12]:
# Return visualizations for data. 
# Main one showing an overview of all plants from least < most invasive species.
# Threat levels on x-axis, numbers in groups of 10, on y-axis.
# Second one being family groups by number of species within each family.
# Family names on x-axis, numbers in groups of 10, on y-axis.
# Third, and maybe most impressive will be a bar chart showing nativ vs. exotic species.
# Plot graph in shades of greens, yellow, orange, and red. Green being on the watch list, red being severe.

In [13]:
#----------------------------------------------------
# Dictionary
#----------------------------------------------------

# Use data to create a dict so user can enter specific info and find plant invasiness, family groups, etc.

import Invasive_Plant_Dict
break

SyntaxError: 'break' outside loop (<ipython-input-13-1eebabc914cd>, line 11)

#### Possible Future Features

In [14]:
# Create a method so a user can view the visualization for all plants at one type of threat level.
# Gather more data to show growth rates over time. (Fingers crossed on that one.)
# Gather more data to find the top 5, 10 & 50 most invasive species.