Get ready for a buzzworthy event as we bring together our beloved bees and scientifc names in a data-driven dance! Today, we’re not just talking about pollen and nectar—we’re talking about the sweet harmony of concatenating bee data and then merging them into one vibrant dataset.

Just like a bee flits from blossom to blossom, we’ll be combining our datasets to create a hive of insights. Whether it’s matching the perfect pollinator with its scientific namesake or finding out where the buzz is really happening, this bee-floral fusion will reveal the secrets of the garden in ways we’ve never seen before.

In [17]:
import pandas as pd

df1=pd.read_csv('/workspaces/myfolder/SASPythonDataScientists/pattern_decline_N_American_Bumblebees.csv', encoding='latin-1')
df2=pd.read_csv('/workspaces/myfolder/SASPythonDataScientists/pattern_decline_Mexican_Bumblebees.csv' , encoding='latin-1')
df3=pd.read_csv('/workspaces/myfolder/SASPythonDataScientists/Bumblebee_Others_Scientific_Common_Names.csv' , encoding='latin-1')
df4=pd.read_csv('/workspaces/myfolder/SASPythonDataScientists/native_vs_nonnative_bumblebee_sighting_pollinators_of_farm_data_for_publication.csv' , encoding='latin-1')

  df1=pd.read_csv('/workspaces/myfolder/SASPythonDataScientists/pattern_decline_N_American_Bumblebees.csv', encoding='latin-1')


## Python Data Preparation

##### Concatenate 2 data frames to combine North American(excluding Alaska) and Mexican Bumblebees

Take a quick look at the dimensions of the 2 dataframes we are about to concatenate

In [18]:
# North American bumblebee decline dataframe
df1.shape

(66907, 26)

In [19]:
# Mexican bumblebee decline dataframe
df2.shape

(24, 26)

Concatenation is a way to stitch dataframes along an axis, either row axis or column axis

use concat() and pass it a list of DataFrames that you want to concatenate. Code for this task below

In [20]:
dfconc=pd.concat([df1,df2])

In [21]:
dfconc.shape

(66931, 26)

## Merging data frames

We're diving into the world of bumblebees by buzzing through some data magic in Python! Imagine we've got one table that's packed with the common names of our favorite fuzzy pollinators, and another that's got their nesting habits. By merging the common names  with the nesting habits names into one tidy table, we're basically creating the ultimate bee database—bringing together the familiar and the formal. It's like giving each bee its proper name tag at the hive party! This way, we can easily connect the dots between the Latin and the layman's terms, making our bumblebee data analysis as sweet as honey. 🐝💻

In [None]:
list(df4)

In [None]:
print(df4)

In [None]:
df4.describe

Take a quick look at the dimensions of the tables we are about to merge

In [None]:
df4.shape

When working with our two bumblebee tables—one buzzing with scientific names and the other humming with common names—Python's merge() function is like a matchmaker for your data. The great thing about merge() is that it lets you decide exactly how these two tables come together. Say you want to merge them based on the ScientificName column, ensuring that each bee's formal identity pairs up perfectly with its everyday nickname. By using the on parameter, you can create the ultimate bee directory where the Latin meets the common, all while keeping your data as sharp as a bee's stinger! 🐝🔗

qIn the world of pandas, DataFrames have a merge() method,  with similar functionality to SAS joins. No need to sort ahead of time—perform all kinds of different joins by simply using the how keyword. It’s like a hive of possibilities for your data!

In [39]:
inner_join = dfconc.merge(df3, on=["SCIENTIFICNAME"], how="inner")

KeyError: 'SCIENTIFICNAME'

 column names for dataframes are case sensitive.

Dataframe column names are essentially string values, which are case sensitive in Python. Because of this, you will need to be careful whenever you utilize column names, such as when renaming a column, accessing columns or performing functions on them.

In [40]:
dfconc.columns = dfconc.columns.str.lower()

In [41]:
list(dfconc)

['id',
 'institutioncode',
 'collectioncode',
 'basisofrecord',
 'occurrenceid',
 'catalognumber',
 'recordedby',
 'year',
 'month',
 'day',
 'country',
 'stateprovince',
 'county',
 'locality',
 'verbatimlatitude',
 'verbatimlongitude',
 'identifiedby',
 'scientificname',
 'kingdom',
 'phylum',
 'class',
 'order',
 'family',
 'genus',
 'specificepithet',
 'scientificnameauthorship']

In [42]:
df3.columns = df3.columns.str.lower()

In [43]:
list(df3)

['scientificname',
 'species',
 'specificepithet',
 'commonname',
 'description',
 'source']

In [44]:
df_inner = dfconc.merge(df3, on=["scientificname"], how="inner")

In [45]:
df_inner.describe

<bound method NDFrame.describe of           id institutioncode collectioncode      basisofrecord  occurrenceid  \
0          1        USDA-ARS           BBSL  PreservedSpecimen   699384987.0   
1          2        USDA-ARS           BBSL  PreservedSpecimen   699384988.0   
2          3        USDA-ARS           BBSL  PreservedSpecimen   699384989.0   
3          4        USDA-ARS           BBSL  PreservedSpecimen   699384990.0   
4          5        USDA-ARS           BBSL  PreservedSpecimen   699384991.0   
...      ...             ...            ...                ...           ...   
66926  66927        USDA-ARS           BBSL  PreservedSpecimen           NaN   
66927  66928        USDA-ARS           BBSL  PreservedSpecimen           NaN   
66928  66929        USDA-ARS           BBSL  PreservedSpecimen           NaN   
66929  66930        USDA-ARS           BBSL  PreservedSpecimen           NaN   
66930  66931        USDA-ARS           BBSL  PreservedSpecimen           NaN   

     