# Combining DataFrames With Pandas

## Introduction

In this section, you'll learn about how to combine DataFrames with concatenation.  You'll also learn how to read in tables from SQL databases and store them in DataFrames, as well as the various types of joins that exist and how you can perform them in pandas.  

## Objectives
You will be able to:
* Understand and explain when to use DataFrame joins and merges
* Be able to use pd.merge when combining DataFrames based on column values
* Understand, explain and use a range of DataFrame merge types: outer, inner, left and right
* Use pd.concat() to stack DataFrames


## Concatenating DataFrames

Recall that "concatenation" means adding the contents of a second collection on to the end of the first collection.  You learned how to do this when working with strings.  For instance:

```python
print('Data ' + 'Science!')
# Output: "Data Science!"
```
Since strings are a form of collections in python, you can concatenate them as above.  

DataFrames are also collections, so it stands to reason that pandas provides an easy way to concatenate them.  Examine the following diagram from the pandas documentation on concatenation:

<img src='./images/Image_197_concat.png'>

In this example, 3 DataFrames have been concatenated, resulting in one larger dataframe containing the contents in the order they were concatenated.  

To perform a concatenation between 2 or more DataFrames, you pass in an array of the objects to concatenate to the `pd.concat()` function, as demonstrated below:

```python
to_concat = [df1, df2, df3]
big_df = pd.concat(to_concat)
```

Note that there are many different optional keyword arguments you can set with `pd.concat()`--for a full breakdown of all the ways you can use this method, take a look at the [pandas documentation](http://pandas.pydata.org/pandas-docs/stable/merging.html).



# Concatenate Dataframe

In [1]:
import pandas as pd
import numpy as np

In [2]:
columns = ['name', 'age', 'gender', 'job']
user1 = pd.DataFrame([['Giulia', 19, "F", "Actress"],
                      ['Josh', 26, "M", "Data Scientist"]],                  
                       columns=columns)
user1

Unnamed: 0,name,age,gender,job
0,Giulia,19,F,Actress
1,Josh,26,M,Data Scientist


In [3]:
user2 = pd.DataFrame([['Elisa', 22, "M", "student"],
                    ['Pablo', 58, "F", "Architect"]],
                     columns=columns)
user2

Unnamed: 0,name,age,gender,job
0,Elisa,22,M,student
1,Pablo,58,F,Architect


In [4]:
user3 = pd.DataFrame(dict(name=['Pietro', 'Saltini'],
                  age=[33, 44], gender=['M', 'F'],
                  job=['Farmer', 'Scientist']))
user3

Unnamed: 0,name,age,gender,job
0,Pietro,33,M,Farmer
1,Saltini,44,F,Scientist


In [16]:
user1.append(user2)
total_users = pd.concat([user1, user2 , user3])
total_users

Unnamed: 0,name,age,gender,job
0,Giulia,19,F,student
1,Josh,26,M,Data Scientist
0,Elisa,22,M,student
1,Pablo,58,F,Architect
0,Pietro,33,M,Engineer
1,Saltini,44,F,Scientist


### Keys and Indexes

Every table in a Database has a column that serves as the **_Primary Key_**. In pandas, the index is the primary key for that table. You'll use these keys, along with the **_Foreign Key_**, which points to a primary key value in another table, to execute **_Joins_**. This allows us to "line up" information from multiple tables and combine them into one table. You'll learn more about Primary Keys and Foreign Keys in the next future when you'll dive into SQL and relational databases, so don't worry too much about these concepts now. That said, you can use similar functionality in Pandas.

Often, it is useful for us to set a column to act as the index for a DataFrame.  To do this, you would type:

```python
some_dataframe.set_index("name_of_index_column", inplace=True)
```

Note that this will mutate the dataset in place and set the column with the specified name as the index column of the DataFrame.  If `inplace` is not specified it will default to False, meaning that a copy of the DataFrame with the requested changes will be returned, but the original object will remain unchanged. 

**_NOTE:_** Running cells that make an `inplace` change more than once will often cause pandas to throw an error.  If this happens, just restart the kernel.

By setting the index columns on DataFrames, you make it easy to join DataFrames later on. Note that this is not always feasible, but it's a useful step when possible.

### Types of Joins

Joins are always executed between a **_Left Table_** and a **_Right Table_**.  There are four different types of Joins you can execute.  Consider the following Venn Diagrams:

<img src='./images/Image_198_joins.png'>

When thinking about Joins, it is easy to conceptualize them as Venn Diagrams.  

An **_Outer Join_** returns all records from both tables. 

An **_Inner Join_** returns only the records with matching keys in both tables.

A **_Left Join_** returns all the records from the left table, as well as any records from the right table that have a matching key with a record from the left table.

A **_Right Join_** returns all the records from the right table, as well as any records from the left table that have a matching key with a record from the right table. 

DataFrames contain a built-in `.join()` method. By default, the table calling the `.join()` method is always the left table.  The following code snippet demonstrates how to execute a join in pandas:

```python
joined_df = df1.join(df2, how='inner')
```

Note that to call `.join()`, you must pass in the right table.  You can also set the type of join to perform with the `how` parameter.  The options are `'left'`, `'right'`, `'inner'`, and `'outer'`.

**If** `how=` **is not specified, it defaults to `'left'`.**

**_NOTE:_** If both tables contain columns with the same name, the join will throw an error due to a naming collision, since the resulting table would have multiple columns with the same name.  To solve this, pass in a value to `lsuffix=` or `rsuffix=`, which will append this suffix to the offending columns to resolve the naming collisions. 

## Summary

In this lesson, you learned how to use concatenation to join together multiple DataFrames in Pandas.


# Join DataFrames

In [13]:
user4 = pd.DataFrame(dict(name=["Giulia","Josh","Elisa","Saltini"],
                         height=[165,180, 170,190]))
user4

Unnamed: 0,name,height
0,Giulia,165
1,Josh,180
2,Elisa,170
3,Saltini,190


### We use intersection of keys from both DataFrames

In [17]:
merge_df = pd.merge(total_users , user4 , on="name")
merge_df

Unnamed: 0,name,age,gender,job,height
0,Giulia,19,F,student,165
1,Josh,26,M,Data Scientist,180
2,Elisa,22,M,student,170
3,Saltini,44,F,Scientist,190


# Using Union of keys from both DataFrames

In [18]:
new_users = pd.merge(total_users, user4 , on="name" , how="outer")
new_users

Unnamed: 0,name,age,gender,job,height
0,Giulia,19,F,student,165.0
1,Josh,26,M,Data Scientist,180.0
2,Elisa,22,M,student,170.0
3,Pablo,58,F,Architect,
4,Pietro,33,M,Engineer,
5,Saltini,44,F,Scientist,190.0


### Note : Since the height column did not have inputs for Pabla and Pietro it will return NaN ie as a missing number

# Now that we have seen a bit more about joins lets take a closer look at the type of joins.

# One-to-one joins
+ Perhaps the simplest type of merge expresion is the one-to-one join, which is in many ways very similar to the column-wise concatenation seen in Combining Datasets: Concat & Append. As a concrete example, consider the following two DataFrames which contain information on several employees in a company:

In [37]:
data1 = pd.DataFrame({'Employee': ['Bob', 'Jake', 'Lisa', 'Sue'],
                    'Department': ['Accounting', 'Engineering', 'Engineering', 'HR']})
data2 = pd.DataFrame({'Employee': ['Lisa', 'Bob', 'Jake', 'Sue'],
                    'Hire_date': [2004, 2008, 2012, 2014]})
display(data1, data2)

Unnamed: 0,Employee,Department
0,Bob,Accounting
1,Jake,Engineering
2,Lisa,Engineering
3,Sue,HR


Unnamed: 0,Employee,Hire_date
0,Lisa,2004
1,Bob,2008
2,Jake,2012
3,Sue,2014


## To combine this information into a single DataFrame, we can use the `pd.merge()` function:

In [38]:
data3 = pd.merge(data1 , data2)
data3.head()

Unnamed: 0,Employee,Department,Hire_date
0,Bob,Accounting,2008
1,Jake,Engineering,2012
2,Lisa,Engineering,2004
3,Sue,HR,2014


### Note: 
+ The pd.merge() function recognizes that each DataFrame has an "employee" column, and automatically joins using this column as a key. 
+ The result of the merge is a new DataFrame that combines the information from the two inputs. Notice that the order of entries in each column is not necessarily maintained: in this case, the order of the "employee" column differs between data1 and data2, and the pd.merge() function correctly accounts for this. Additionally, keep in mind that the merge in general discards the index, except in the special case of merges by index (see the left_index and right_index keywords, discussed momentarily).
    

# Many-to-one joins
Many-to-one joins are joins in which one of the two key columns contains duplicate entries. For the many-to-one case, the resulting DataFrame will preserve those duplicate entries as appropriate. 
Consider the following example of a many-to-one join:

In [40]:
data4 = pd.DataFrame({'Department': ['Accounting', 'Engineering', 'HR'],
                    'Supervisor': ['Carly', 'Guido', 'Steve']})
display(data3, data4, pd.merge(data3, data4))

Unnamed: 0,Employee,Department,Hire_date
0,Bob,Accounting,2008
1,Jake,Engineering,2012
2,Lisa,Engineering,2004
3,Sue,HR,2014


Unnamed: 0,Department,Supervisor
0,Accounting,Carly
1,Engineering,Guido
2,HR,Steve


Unnamed: 0,Employee,Department,Hire_date,Supervisor
0,Bob,Accounting,2008,Carly
1,Jake,Engineering,2012,Guido
2,Lisa,Engineering,2004,Guido
3,Sue,HR,2014,Steve


The resulting DataFrame has an aditional column with the "supervisor" information, where the information is repeated in one or more locations as required by the inputs.

## Many-to-many joins
Many-to-many joins are a bit confusing conceptually, but are nevertheless well defined. If the key column in both the left and right array contains duplicates, then the result is a many-to-many merge. This will be perhaps most clear with a concrete example. Consider the following, where we have a DataFrame showing one or more skills associated with a particular group. By performing a many-to-many join, we can recover the skills associated with any individual person:

In [41]:
data5 = pd.DataFrame({'Department': ['Accounting', 'Accounting',
                              'Engineering', 'Engineering', 'HR', 'HR'],
                    'skills': ['math', 'spreadsheets', 'coding', 'linux',
                               'spreadsheets', 'organization']})
display(data1, data5, pd.merge(data1, data5))

Unnamed: 0,Employee,Department
0,Bob,Accounting
1,Jake,Engineering
2,Lisa,Engineering
3,Sue,HR


Unnamed: 0,Department,skills
0,Accounting,math
1,Accounting,spreadsheets
2,Engineering,coding
3,Engineering,linux
4,HR,spreadsheets
5,HR,organization


Unnamed: 0,Employee,Department,skills
0,Bob,Accounting,math
1,Bob,Accounting,spreadsheets
2,Jake,Engineering,coding
3,Jake,Engineering,linux
4,Lisa,Engineering,coding
5,Lisa,Engineering,linux
6,Sue,HR,spreadsheets
7,Sue,HR,organization



+ These three types of joins can be used with other Pandas tools to implement a wide array of functionality. But in practice, datasets are rarely as clean as the one we're working with here. In the following section we'll consider some of the options provided by pd.merge() that enable you to tune how the join operations work.

## Specification of the Merge Key
+ We've already seen the default behavior of pd.merge(): it looks for one or more matching column names between the two inputs, and uses this as the key. However, often the column names will not match so nicely, and pd.merge() provides a variety of options for handling this.

## The on keyword
Most simply, you can explicitly specify the name of the key column using the on keyword, which takes a column name or a list of column names:

In [43]:
display(data1, data2, pd.merge(data1, data2, on='Employee'))


Unnamed: 0,Employee,Department
0,Bob,Accounting
1,Jake,Engineering
2,Lisa,Engineering
3,Sue,HR


Unnamed: 0,Employee,Hire_date
0,Lisa,2004
1,Bob,2008
2,Jake,2012
3,Sue,2014


Unnamed: 0,Employee,Department,Hire_date
0,Bob,Accounting,2008
1,Jake,Engineering,2012
2,Lisa,Engineering,2004
3,Sue,HR,2014


+ This option works only if both the left and right DataFrames have the specified column name.


## The left_on and right_on keywords
+ At times you may wish to merge two datasets with different column names; for example, we may have a dataset in which the employee name is labeled as "name" rather than "employee". In this case, we can use the left_on and right_on keywords to specify the two column names:

In [44]:
data3 = pd.DataFrame({'Name': ['Bob', 'Jake', 'Lisa', 'Sue'],
                    'salary': [70000, 80000, 120000, 90000]})
display(data1, data3, pd.merge(data1, data3, left_on="Employee", right_on="Name"))

Unnamed: 0,Employee,Department
0,Bob,Accounting
1,Jake,Engineering
2,Lisa,Engineering
3,Sue,HR


Unnamed: 0,Name,salary
0,Bob,70000
1,Jake,80000
2,Lisa,120000
3,Sue,90000


Unnamed: 0,Employee,Department,Name,salary
0,Bob,Accounting,Bob,70000
1,Jake,Engineering,Jake,80000
2,Lisa,Engineering,Lisa,120000
3,Sue,HR,Sue,90000



+ The result has a redundant column that we can drop if desired–for example, by using the drop() method of DataFrames:

In [45]:
pd.merge(data1, data3, left_on="Employee", right_on="Name").drop('Name', axis=1)

Unnamed: 0,Employee,Department,salary
0,Bob,Accounting,70000
1,Jake,Engineering,80000
2,Lisa,Engineering,120000
3,Sue,HR,90000


## The left_index and right_index keywords¶
+ Sometimes, rather than merging on a column, you would instead like to merge on an index. For example, your data might look like this:

In [46]:
data1_new= data1.set_index('Employee')
data2_new= data2.set_index('Employee')
display(data1_new, data2_new)

Unnamed: 0_level_0,Department
Employee,Unnamed: 1_level_1
Bob,Accounting
Jake,Engineering
Lisa,Engineering
Sue,HR


Unnamed: 0_level_0,Hire_date
Employee,Unnamed: 1_level_1
Lisa,2004
Bob,2008
Jake,2012
Sue,2014


# Or

+ You can use the index as the key for merging by specifying the left_index and/or right_index flags in pd.merge():

In [48]:
display(data1_new, data2_new,
        pd.merge(data1_new, data2_new, left_index=True, right_index=True))

Unnamed: 0_level_0,Department
Employee,Unnamed: 1_level_1
Bob,Accounting
Jake,Engineering
Lisa,Engineering
Sue,HR


Unnamed: 0_level_0,Hire_date
Employee,Unnamed: 1_level_1
Lisa,2004
Bob,2008
Jake,2012
Sue,2014


Unnamed: 0_level_0,Department,Hire_date
Employee,Unnamed: 1_level_1,Unnamed: 2_level_1
Bob,Accounting,2008
Jake,Engineering,2012
Lisa,Engineering,2004
Sue,HR,2014


+ If you'd like to mix indices and columns, you can combine left_index with right_on or left_on with right_index to get the desired behavior:

In [50]:
display(data1_new, data3, 
        pd.merge(data1_new, data3, left_index=True, right_on='Name'))

Unnamed: 0_level_0,Department
Employee,Unnamed: 1_level_1
Bob,Accounting
Jake,Engineering
Lisa,Engineering
Sue,HR


Unnamed: 0,Name,salary
0,Bob,70000
1,Jake,80000
2,Lisa,120000
3,Sue,90000


Unnamed: 0,Department,Name,salary
0,Accounting,Bob,70000
1,Engineering,Jake,80000
2,Engineering,Lisa,120000
3,HR,Sue,90000


+ All of these options also work with multiple indices and/or multiple columns; the interface for this behavior is very intuitive. For more information on this, see the [Merge, Join, and Concatenate](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html) section of the Pandas documentation.


## Specifying Set Arithmetic for Joins
In all the preceding examples we have glossed over one important consideration in performing a join: the type of set arithmetic used in the join. This comes up when a value appears in one key column but not the other. Consider this example:

In [55]:
data6 = pd.DataFrame({'Name': ['Peter', 'Paul', 'Mary'],
                    'Food': ['fish', 'beans', 'bread']},
                   columns=['Name', 'Food'])
data7 = pd.DataFrame({'Name': ['Mary', 'Joseph'],
                    'Drink': ['wine', 'beer']},
                   columns=['Name', 'Drink'])

In [56]:
display(data6, data7, pd.merge(data6, data7))

Unnamed: 0,Name,Food
0,Peter,fish
1,Paul,beans
2,Mary,bread


Unnamed: 0,Name,Drink
0,Mary,wine
1,Joseph,beer


Unnamed: 0,Name,Food,Drink
0,Mary,bread,wine


+ Here we have merged two datasets that have only a single "Name" entry in common: Mary. By default, the result contains the intersection of the two sets of inputs; this is what is known as an `inner join`. We can specify this explicitly using the how keyword, which defaults to "inner":


## Inner Join

An **_Inner Join_** returns only the records with matching keys in both tables.

<img src='./images/innerjoin.png'>

In [57]:
pd.merge(data6 , data7 , how="inner")

Unnamed: 0,Name,Food,Drink
0,Mary,bread,wine


## Other options for the how keyword are 'outer', 'left', and 'right'. An outer join returns a join over the union of the input columns, and fills in all missing values with NAs:

# Outer Join

An **_Outer Join_** returns all records from both tables. 

<img src='./images/outerjoin.png'>

In [58]:
display(data6 , data7 , pd.merge(data6 , data7 , how="outer"))

Unnamed: 0,Name,Food
0,Peter,fish
1,Paul,beans
2,Mary,bread


Unnamed: 0,Name,Drink
0,Mary,wine
1,Joseph,beer


Unnamed: 0,Name,Food,Drink
0,Peter,fish,
1,Paul,beans,
2,Mary,bread,wine
3,Joseph,,beer


## The left join and right join return joins over the left entries and right entries, respectively. For example:


# Left Join

A **_Left Join_** returns all the records from the left table, as well as any records from the right table that have a matching key with a record from the left table.


<img src='./images/leftjoin.png'>

In [60]:
display(data6, data7, pd.merge(data6, data7, how='left'))

Unnamed: 0,Name,Food
0,Peter,fish
1,Paul,beans
2,Mary,bread


Unnamed: 0,Name,Drink
0,Mary,wine
1,Joseph,beer


Unnamed: 0,Name,Food,Drink
0,Peter,fish,
1,Paul,beans,
2,Mary,bread,wine


# Right Join

A **_Right Join_** returns all the records from the right table, as well as any records from the left table that have a matching key with a record from the right table.

<img src='./images/rightjoin.png'>

In [61]:
display(data6, data7, pd.merge(data6, data7, how='right'))

Unnamed: 0,Name,Food
0,Peter,fish
1,Paul,beans
2,Mary,bread


Unnamed: 0,Name,Drink
0,Mary,wine
1,Joseph,beer


Unnamed: 0,Name,Food,Drink
0,Mary,bread,wine
1,Joseph,,beer


# Exercise USA States

![mania](https://media.giphy.com/media/xT39D7ubkIUIrgX7JS/giphy.gif)

In [19]:
# Following are shell commands to download the data into your current directory
!curl -O https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-population.csv
!curl -O https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-areas.csv
!curl -O https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-abbrevs.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 57935  100 57935    0     0   199k      0 --:--:-- --:--:-- --:--:--  201k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   835  100   835    0     0   3614      0 --:--:-- --:--:-- --:--:--  3614
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   872  100   872    0     0   3136      0 --:--:-- --:--:-- --:--:--  3125     0      0      0 --:--:-- --:--:-- --:--:--     0


# Population Data

In [21]:
population = pd.read_csv("state-population.csv")
population.head()

Unnamed: 0,state/region,ages,year,population
0,AL,under18,2012,1117489.0
1,AL,total,2012,4817528.0
2,AL,under18,2010,1130966.0
3,AL,total,2010,4785570.0
4,AL,under18,2011,1125763.0


# Area / Size of State

In [22]:
areas = pd.read_csv("state-areas.csv")
areas.head()

Unnamed: 0,state,area (sq. mi)
0,Alabama,52423
1,Alaska,656425
2,Arizona,114006
3,Arkansas,53182
4,California,163707


# Names Of Each State and Abbrevations

In [23]:
states = pd.read_csv("state-abbrevs.csv")
states.head()

Unnamed: 0,state,abbreviation
0,Alabama,AL
1,Alaska,AK
2,Arizona,AZ
3,Arkansas,AR
4,California,CA


# Outer Join
An **_Outer Join_** returns all records from both tables. 


<img src='./images/outerjoin.png'>


+ Given this information, say we want to compute a relatively straightforward result: rank US states and territories by their 2010 population density. We clearly have the data here to find this result, but we'll have to combine the datasets to find the result.

+ We'll start with a many-to-one merge that will give us the full state name within the population DataFrame. We want to merge based on the state/region column of pop, and the abbreviation column of abbrevs. We'll use how='outer' to make sure no data is thrown away due to mismatched labels.

In [27]:
outerjoin = pd.merge(population , states , how="outer" , left_on="state/region",
                    right_on = "abbreviation")
outerjoin.head()

Unnamed: 0,state/region,ages,year,population,state,abbreviation
0,AL,under18,2012,1117489.0,Alabama,AL
1,AL,total,2012,4817528.0,Alabama,AL
2,AL,under18,2010,1130966.0,Alabama,AL
3,AL,total,2010,4785570.0,Alabama,AL
4,AL,under18,2011,1125763.0,Alabama,AL


# Note : The `state/region` and `abbreviation` columns both have the same data so we need to drop one

In [28]:
outerjoin.drop("abbreviation" , axis=1, inplace=True)
outerjoin.head()

Unnamed: 0,state/region,ages,year,population,state
0,AL,under18,2012,1117489.0,Alabama
1,AL,total,2012,4817528.0,Alabama
2,AL,under18,2010,1130966.0,Alabama
3,AL,total,2010,4785570.0,Alabama
4,AL,under18,2011,1125763.0,Alabama


#  &#9758; Observation:
+ We can see from the above that the `abbreviation` column has been dropped.

### Now let's double-check whether there were any mismatches here, which we can do by looking for rows with nulls:


In [29]:
outerjoin.isna().any()

state/region    False
ages            False
year            False
population       True
state            True
dtype: bool

#  &#9758; Observation:
+ Some of the population info is null; let's figure out which these are!

In [30]:
outerjoin[outerjoin["population"].isna()].head()

Unnamed: 0,state/region,ages,year,population,state
2448,PR,under18,1990,,
2449,PR,total,1990,,
2450,PR,total,1991,,
2451,PR,under18,1991,,
2452,PR,total,1993,,


#  &#9758; Observation:

+ It appears that all the null population values are from Puerto Rico prior to the year 2000; this is likely due to this data not being available from the original source.

+ More importantly, we see also that some of the new state entries are also null, which means that there was no corresponding entry in the abbrevs key! Let's figure out which regions lack this match:

In [31]:
outerjoin.loc[outerjoin['state'].isnull(), 'state/region'].unique()

array(['PR', 'USA'], dtype=object)

+ We can quickly infer the issue: our population data includes entries for Puerto Rico (PR) and the United States as a whole (USA), while these entries do not appear in the state abbreviation key. We can fix these quickly by filling in appropriate entries:

In [32]:
outerjoin.loc[outerjoin['state/region'] == 'PR', 'state'] = 'Puerto Rico'
outerjoin.loc[outerjoin['state/region'] == 'USA', 'state'] = 'United States'
outerjoin.isnull().any()

state/region    False
ages            False
year            False
population       True
state           False
dtype: bool

#  &#9758; Observation:

+ No more nulls in the state column: we're all set!

+ Now we can merge the result with the area data using a similar procedure. Examining our results, we will want to join on the state column in both:

In [33]:
final_df = pd.merge(outerjoin , areas , on="state" , how="left")
final_df.head()

Unnamed: 0,state/region,ages,year,population,state,area (sq. mi)
0,AL,under18,2012,1117489.0,Alabama,52423.0
1,AL,total,2012,4817528.0,Alabama,52423.0
2,AL,under18,2010,1130966.0,Alabama,52423.0
3,AL,total,2010,4785570.0,Alabama,52423.0
4,AL,under18,2011,1125763.0,Alabama,52423.0


+ Again, let's check for nulls to see if there were any mismatches:

In [62]:
final_df.isnull().any()

state/region     False
ages             False
year             False
population        True
state            False
area (sq. mi)     True
dtype: bool

# Drop all null values from the df

In [63]:
final_df.dropna(inplace=True)

In [64]:
# Check results
final_df.isna().any().any()

False

## Summary

In this lesson, you learned how to use concatenation to join together multiple DataFrames in Pandas.


![USA](https://media.giphy.com/media/3ohzdZgKrpxSDzbvPO/giphy.gif)