# Concatenating Data

## 1. Appending pandas Series
In this exercise, you'll load sales data from the months January, February, and March into DataFrames. Then, you'll extract Series with the `'Units'` column from each and append them together with method chaining using `.append()`.

To check that the stacking worked, you'll print slices from these Series, and finally, you'll add the result to figure out the total units sold in the first quarter.

In [1]:
# Importing required packages
import pandas as pd

In [2]:
# Load 'sales-jan-2015.csv' into a DataFrame: jan
jan = pd.read_csv("data/Sales/sales-jan-2015.csv", index_col = "Date", parse_dates = True)

# Load 'sales-feb-2015.csv' into a DataFrame: feb
feb = pd.read_csv("data/Sales/sales-feb-2015.csv", index_col = "Date", parse_dates = True)

# Load 'sales-mar-2015.csv' into a DataFrame: mar
mar = pd.read_csv("data/Sales/sales-mar-2015.csv", index_col = "Date", parse_dates = True)

In [3]:
# Print tail of all three dataframes
print(jan.tail())
print(feb.tail())
print(mar.tail())

                             Company   Product  Units
Date                                                 
2015-01-06 13:47:37  Acme Coporation  Software     16
2015-01-15 15:33:40        Mediacore  Hardware      7
2015-01-27 07:11:55        Streeplex   Service     18
2015-01-20 11:28:02        Streeplex  Software     13
2015-01-16 19:20:46        Mediacore   Service      8
                       Company   Product  Units
Date                                           
2015-02-19 16:02:58  Mediacore   Service     10
2015-02-19 10:59:33  Mediacore  Hardware     16
2015-02-02 20:54:49  Mediacore  Hardware      9
2015-02-21 05:01:26  Mediacore  Software      3
2015-02-21 20:41:47      Hooli  Hardware      3
                       Company   Product  Units
Date                                           
2015-03-13 11:40:16    Initech  Software     11
2015-03-27 08:29:45  Mediacore  Software      6
2015-03-21 06:42:41  Mediacore  Hardware     19
2015-03-15 08:50:45    Initech  Hardware     1

In [4]:
# Extract the 'Units' column from jan: jan_units
jan_units = jan['Units']

# Extract the 'Units' column from feb: feb_units
feb_units = feb['Units']

# Extract the 'Units' column from mar: mar_units
mar_units = mar['Units']

# Append feb_units and then mar_units to jan_units: quarter1
quarter1 = jan_units.append(feb_units).append(mar_units)

# Print the tail of quarter1
quarter1.tail()

Date
2015-03-13 11:40:16    11
2015-03-27 08:29:45     6
2015-03-21 06:42:41    19
2015-03-15 08:50:45    18
2015-03-13 16:25:24     9
Name: Units, dtype: int64

In [5]:
# Print the first slice from quarter1
quarter1.loc['jan 27, 2015':'feb 2, 2015']

Date
2015-01-27 07:11:55    18
2015-02-02 08:33:01     3
2015-02-02 20:54:49     9
Name: Units, dtype: int64

In [6]:
# Print the second slice from quarter1
quarter1.loc['feb 26, 2015':"mar 7, 2015"]

Date
2015-02-26 08:57:45     4
2015-02-26 08:58:51     1
2015-03-06 10:11:45    17
2015-03-06 02:03:56    17
Name: Units, dtype: int64

In [7]:
# Compute & print total sales in quarter1
quarter1.sum()

642

Well done! As you can see, appending pandas Series is very straightforward!



## 2. Concatenating pandas Series along row axis
Having learned how to append Series, you'll now learn how to achieve the same result by concatenating Series instead. You'll continue to work with the sales data you've seen previously.

Your job is to use `pd.concat()` with a list of Series to achieve the same result that you would get by chaining calls to `.append()`.

You may be wondering about the difference between `pd.concat()` and pandas' `.append()` method. One way to think of the difference is that `.append()` is a specific case of a concatenation, while `pd.concat()` gives you more flexibility.

In [8]:
# Initialize empty list: units
units = []

# Build the list of Series
for month in [jan, feb, mar]:
    units.append(month["Units"])

# print Length of units list plust the length of lists within units list
print(len(units), len(units[0]), len(units[1]), len(units[2]))

3 20 20 20


In [9]:
# Concatenate the list: quarter1
quarter1 = pd.concat(units, axis = "rows")

# Print slices from quarter1
print(quarter1.loc['jan 27, 2015':'feb 2, 2015'])
print(quarter1.loc['feb 26, 2015':'mar 7, 2015'])

Date
2015-01-27 07:11:55    18
2015-02-02 08:33:01     3
2015-02-02 20:54:49     9
Name: Units, dtype: int64
Date
2015-02-26 08:57:45     4
2015-02-26 08:58:51     1
2015-03-06 10:11:45    17
2015-03-06 02:03:56    17
Name: Units, dtype: int64


Great work! As in this exercise, you can achieve the same results as appending by concatenating along the row axis.



## 3. Appending DataFrames with `ignore_index`
In this exercise, you'll use the [Baby Names Dataset](https://www.data.gov/developers/baby-names-dataset/) (from data.gov) again.

You'll use the DataFrame `.append()` method to make a DataFrame `combined_names`. To distinguish rows from the original two DataFrames, you'll add a `'year'` column to each with the year (1881 or 1981 in this case). In addition, you'll specify `ignore_index=True` so that the index values are not used along the concatenation axis. The resulting axis will instead be labeled `0, 1, ..., n-1,` which is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information.

In [23]:
# Loading dataframes
names_1981 = pd.read_csv('data/Baby names/names1981.csv', header=None, names=['name','gender','count'])
names_1981.tail()

Unnamed: 0,name,gender,count
19450,Zeferino,M,5
19451,Zerrick,M,5
19452,Zimbabwe,M,5
19453,Zoltan,M,5
19454,Zuriel,M,5


In [24]:
# Loading dataframes
names_1881 = pd.read_csv('data/Baby names/names1881.csv', header=None, names=['name','gender','count'])
names_1881.tail()

Unnamed: 0,name,gender,count
1930,Wiliam,M,5
1931,Wilton,M,5
1932,Wing,M,5
1933,Wood,M,5
1934,Wright,M,5


In [25]:
# Add 'year' column to names_1881 and names_1981
names_1881['year'] = 1881
names_1981['year'] = 1981

# Append names_1981 after names_1881 with ignore_index=True: combined_names
combined_names = names_1881.append(names_1981, ignore_index=True)

# Print shapes of names_1981, names_1881, and combined_names
print(names_1981.shape)
print(names_1881.shape)
print(combined_names.shape)

(19455, 4)
(1935, 4)
(21390, 4)


In [28]:
# Print all rows that contain the name 'Morgan'
combined_names.loc[combined_names.name == "Morgan"]

Unnamed: 0,name,gender,count,year
1283,Morgan,M,23,1881
2096,Morgan,F,1769,1981
14390,Morgan,M,766,1981


## 4. Concatenating pandas DataFrames along column axis
The function `pd.concat()` can concatenate DataFrames horizontally as well as vertically (vertical is the default). To make the DataFrames stack horizontally, you have to specify the keyword argument `axis=1` or `axis='columns'`.

In this exercise, you'll use weather data with maximum and mean daily temperatures. You'll concatenate the rows of both and see that, where rows are missing in the coarser DataFrame, `null` values are inserted in the concatenated DataFrame. This corresponds to an outer join (which you will explore in more detail in later exercises).

The files `'quarterly_max_temp.csv'` and `'monthly_mean_temp.csv'` have been pre-loaded into the DataFrames `weather_max` and `weather_mean` respectively, and `pandas` has been imported as `pd`.

In [42]:
# Loading data for this exercise
weather_max = pd.read_csv("data/monthly_max_temp.csv", index_col = "Month").loc[["Jan", "Apr", "Jul", "Oct"]]

# Print weather_max
weather_max

Unnamed: 0_level_0,Max TemperatureF
Month,Unnamed: 1_level_1
Jan,68
Apr,84
Jul,91
Oct,84


In [66]:
# Loading data for this exercise
mean = [53.100000,70.000000,34.935484,28.714286, 32.354839, 72.870968, 70.133333, 35.000000, 
        62.612903 , 39.800000, 55.451613,63.766667]
month = ["Apr","Aug", "Dec","Feb",  "Jan", "Jul", "Jun", "Mar", "May", "Nov", "Oct", "Sep"]
weather_mean = pd.DataFrame({"Mean TempeartureF": mean}, index = month, )         
weather_mean.index.name = "Month"

# Print weather_max
weather_mean

Unnamed: 0_level_0,Mean TempeartureF
Month,Unnamed: 1_level_1
Apr,53.1
Aug,70.0
Dec,34.935484
Feb,28.714286
Jan,32.354839
Jul,72.870968
Jun,70.133333
Mar,35.0
May,62.612903
Nov,39.8


In [71]:
# Create a list of weather_max and weather_mean
weather_list = [weather_max, weather_mean]

# Concatenate weather_list horizontally
weather = pd.concat(weather_list, axis =1, sort = True)

# Print weather
weather

Unnamed: 0,Max TemperatureF,Mean TempeartureF
Apr,84.0,53.1
Aug,,70.0
Dec,,34.935484
Feb,,28.714286
Jan,68.0,32.354839
Jul,91.0,72.870968
Jun,,70.133333
Mar,,35.0
May,,62.612903
Nov,,39.8


Well done! This is where you start to see the advantages of concatenating over appending.



## 5. Reading multiple files to build a DataFrame
It is often convenient to build a large DataFrame by parsing many files as DataFrames and concatenating them all at once. You'll do this here with three files, but, in principle, this approach can be used to combine data from dozens or hundreds of files.

Here, you'll work with DataFrames compiled from The Guardian's Olympic medal dataset.

The expression `"%s_top5.csv" % medal` evaluates as a string with the value of medal replacing `%s` in the format string.

In [81]:
#Initialize an empyy list: medals
medals =[]
medal_types = ['bronze', 'silver', 'gold']

for medal in medal_types:
    # Create the file name: file_name
    file_name = "data/Summer Olympic medals/%s_top5.csv" % medal
    # Create list of column names: columns
    columns = ['Country', medal]
    # Read file_name into a DataFrame: medal_df
    medal_df = pd.read_csv(file_name, names = columns, header = 0, index_col= "Country")
    # Append medal_df to medals
    medals.append(medal_df)

# Print medals dataframes
print(medals[0],"\n\n", medals[1], "\n\n",medals[2])

                bronze
Country               
United States   1052.0
Soviet Union     584.0
United Kingdom   505.0
France           475.0
Germany          454.0 

                 silver
Country               
United States   1195.0
Soviet Union     627.0
United Kingdom   591.0
France           461.0
Italy            394.0 

                   gold
Country               
United States   2088.0
Soviet Union     838.0
United Kingdom   498.0
Italy            460.0
Germany          407.0


In [86]:
# Concatenate medals horizontally: medals_df
medals_df = pd.concat(medals, axis="columns", sort = True)

# Print medals_df
medals_df

Unnamed: 0,bronze,silver,gold
France,475.0,461.0,
Germany,454.0,,407.0
Italy,,394.0,460.0
Soviet Union,584.0,627.0,838.0
United Kingdom,505.0,591.0,498.0
United States,1052.0,1195.0,2088.0


Fantastic! Being able to build DataFrames from multiple files like this can be incredibly useful.



## 6. Concatenating vertically to get MultiIndexed rows
When stacking a sequence of DataFrames vertically, it is sometimes desirable to construct a MultiIndex to indicate the DataFrame from which each row originated. This can be done by specifying the `keys` parameter in the call to `pd.concat()`, which generates a hierarchical index with the labels from keys as the outermost index label. So you don't have to rename the columns of each DataFrame as you load it. Instead, only the Index column needs to be specified.

Here, you'll continue working with DataFrames compiled from The Guardian's Olympic medal dataset.

In [95]:
medals = []

for medal in medal_types:

    file_name = "data/Summer Olympic medals/%s_top5.csv" % medal
    
    # Read file_name into a DataFrame: medal_df
    medal_df = pd.read_csv(file_name, index_col="Country")
    
    # Append medal_df to medals
    medals.append(medal_df)
    
# Concatenate medals: medals
medals_df = pd.concat(medals, keys = ["bronze", "silver", "gold"], sort=True)

# Print medals in entirety
medals_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Total
Unnamed: 0_level_1,Country,Unnamed: 2_level_1
bronze,United States,1052.0
bronze,Soviet Union,584.0
bronze,United Kingdom,505.0
bronze,France,475.0
bronze,Germany,454.0
silver,United States,1195.0
silver,Soviet Union,627.0
silver,United Kingdom,591.0
silver,France,461.0
silver,Italy,394.0


Well done! Notice the MultiIndex of medals.



## 7. Slicing MultiIndexed DataFrames
This exercise picks up where the last ended (again using The Guardian's Olympic medal dataset).

You are provided with the MultiIndexed DataFrame as produced at the end of the preceding exercise. Your task is to sort the DataFrame and to use the `pd.IndexSlice` to extract specific slices. 


In [102]:
# Sort the entries of medals: medals_sorted
medals_sorted = medals_df.sort_index(level = 0)

# Print medals_sorted
medals_sorted

Unnamed: 0_level_0,Unnamed: 1_level_0,Total
Unnamed: 0_level_1,Country,Unnamed: 2_level_1
bronze,France,475.0
bronze,Germany,454.0
bronze,Soviet Union,584.0
bronze,United Kingdom,505.0
bronze,United States,1052.0
gold,Germany,407.0
gold,Italy,460.0
gold,Soviet Union,838.0
gold,United Kingdom,498.0
gold,United States,2088.0


In [113]:
# Printing first level indexes
medals_df.index.levels[0]

Index(['bronze', 'silver', 'gold'], dtype='object')

In [114]:
# Printing second level indexes
medals_df.index.levels[1]

Index(['France', 'Germany', 'Italy', 'Soviet Union', 'United Kingdom',
       'United States'],
      dtype='object', name='Country')

In [115]:
# Print the number of Bronze medals won by Germany
medals_sorted.loc[('bronze','Germany')]

Total    454.0
Name: (bronze, Germany), dtype: float64

In [116]:
# Print data about silver medals
medals_sorted.loc['silver']

Unnamed: 0_level_0,Total
Country,Unnamed: 1_level_1
France,461.0
Italy,394.0
Soviet Union,627.0
United Kingdom,591.0
United States,1195.0


In [118]:
# Create alias for pd.IndexSlice: idx
idx = pd.IndexSlice

# Print all the data on medals won by the United Kingdom
medals_sorted.loc[idx[:, "United Kingdom"], :]

Unnamed: 0_level_0,Unnamed: 1_level_0,Total
Unnamed: 0_level_1,Country,Unnamed: 2_level_1
bronze,United Kingdom,505.0
gold,United Kingdom,498.0
silver,United Kingdom,591.0


Great work! It looks like only the United States and the Soviet Union have won more Silver medals than the United Kingdom.



## 8. Concatenating horizontally to get MultiIndexed columns
It is also possible to construct a DataFrame with hierarchically indexed columns. For this exercise, you'll start with pandas imported and a list of three DataFrames called dataframes. All three DataFrames contain `'Company'`, `'Product'`, and `'Units'` columns with a `'Date'` column as the index pertaining to sales transactions during the month of February, 2015. The first DataFrame describes Hardware transactions, the second describes Software transactions, and the third, Service transactions.

Your task is to concatenate the DataFrames horizontally and to create a MultiIndex on the columns. From there, you can summarize the resulting DataFrame and slice some information from it.

In [122]:
# Loading in the dataframes
hardware = pd.read_csv("data/Sales/feb-sales-Hardware.csv", index_col = "Date", parse_dates = True)
software = pd.read_csv("data/Sales/feb-sales-Software.csv", index_col = "Date", parse_dates = True)
service = pd.read_csv("data/Sales/feb-sales-Service.csv", index_col = "Date", parse_dates = True)

In [123]:
# Constructing a list of dataframes
dataframes = [hardware, software, service]

# print the list of dataframes
dataframes

[                             Company   Product  Units
 Date                                                 
 2015-02-04 21:52:45  Acme Coporation  Hardware     14
 2015-02-07 22:58:10  Acme Coporation  Hardware      1
 2015-02-19 10:59:33        Mediacore  Hardware     16
 2015-02-02 20:54:49        Mediacore  Hardware      9
 2015-02-21 20:41:47            Hooli  Hardware      3,
                              Company   Product  Units
 Date                                                 
 2015-02-16 12:09:19            Hooli  Software     10
 2015-02-03 14:14:18          Initech  Software     13
 2015-02-02 08:33:01            Hooli  Software      3
 2015-02-05 01:53:06  Acme Coporation  Software     19
 2015-02-11 20:03:08          Initech  Software      7
 2015-02-09 13:09:55        Mediacore  Software      7
 2015-02-11 22:50:44            Hooli  Software      4
 2015-02-04 15:36:29        Streeplex  Software     13
 2015-02-21 05:01:26        Mediacore  Software      3,
        

In [124]:
# Concatenate dataframes: february
february = pd.concat(dataframes, axis =1 , keys = ["Hardware", "Software", "Service"])

# Print february.info()
february.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 20 entries, 2015-02-02 08:33:01 to 2015-02-26 08:58:51
Data columns (total 9 columns):
(Hardware, Company)    5 non-null object
(Hardware, Product)    5 non-null object
(Hardware, Units)      5 non-null float64
(Software, Company)    9 non-null object
(Software, Product)    9 non-null object
(Software, Units)      9 non-null float64
(Service, Company)     6 non-null object
(Service, Product)     6 non-null object
(Service, Units)       6 non-null float64
dtypes: float64(3), object(6)
memory usage: 1.6+ KB


In [130]:
# Print the february dataframe
february

Unnamed: 0_level_0,Hardware,Hardware,Hardware,Software,Software,Software,Service,Service,Service
Unnamed: 0_level_1,Company,Product,Units,Company,Product,Units,Company,Product,Units
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
2015-02-02 08:33:01,,,,Hooli,Software,3.0,,,
2015-02-02 20:54:49,Mediacore,Hardware,9.0,,,,,,
2015-02-03 14:14:18,,,,Initech,Software,13.0,,,
2015-02-04 15:36:29,,,,Streeplex,Software,13.0,,,
2015-02-04 21:52:45,Acme Coporation,Hardware,14.0,,,,,,
2015-02-05 01:53:06,,,,Acme Coporation,Software,19.0,,,
2015-02-05 22:05:03,,,,,,,Hooli,Service,10.0
2015-02-07 22:58:10,Acme Coporation,Hardware,1.0,,,,,,
2015-02-09 08:57:30,,,,,,,Streeplex,Service,19.0
2015-02-09 13:09:55,,,,Mediacore,Software,7.0,,,


In [141]:
# Print columns of febraury
february.columns

MultiIndex(levels=[['Hardware', 'Software', 'Service'], ['Company', 'Product', 'Units']],
           codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])

In [126]:
# Assign pd.IndexSlice: idx
idx = pd.IndexSlice

# Create the slice: slice_2_8
slice_2_8 = february.loc['Feb.2, 2015':'Feb.8, 2015', idx[:, 'Company']]

# Print slice_2_8
slice_2_8

Unnamed: 0_level_0,Hardware,Software,Service
Unnamed: 0_level_1,Company,Company,Company
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
2015-02-02 08:33:01,,Hooli,
2015-02-02 20:54:49,Mediacore,,
2015-02-03 14:14:18,,Initech,
2015-02-04 15:36:29,,Streeplex,
2015-02-04 21:52:45,Acme Coporation,,
2015-02-05 01:53:06,,Acme Coporation,
2015-02-05 22:05:03,,,Hooli
2015-02-07 22:58:10,Acme Coporation,,


Excellent work! Working with MultiIndexes and MultiIndexed columns can seem tricky at first, but with practice, it will become second nature.



## 9. Concatenating DataFrames from a dict
You're now going to revisit the sales data you worked with earlier in the chapter. Your task is to aggregate the sum of all sales over the `'Company'` column into a single DataFrame. You'll do this by constructing a dictionary of these DataFrames and then concatenating them.

In [151]:
# Make the list of tuples: month_list
month_list = [("january", jan), ("february", feb), ("march", mar)]

# Print month_list
month_list

[('january',                              Company   Product  Units
  Date                                                 
  2015-01-21 19:13:21        Streeplex  Hardware     11
  2015-01-09 05:23:51        Streeplex   Service      8
  2015-01-06 17:19:34          Initech  Hardware     17
  2015-01-02 09:51:06            Hooli  Hardware     16
  2015-01-11 14:51:02            Hooli  Hardware     11
  2015-01-01 07:31:20  Acme Coporation  Software     18
  2015-01-24 08:01:16          Initech  Software      1
  2015-01-25 15:40:07          Initech   Service      6
  2015-01-13 05:36:12            Hooli   Service      7
  2015-01-03 18:00:19            Hooli   Service     19
  2015-01-16 00:33:47            Hooli  Hardware     17
  2015-01-16 07:21:12          Initech   Service     13
  2015-01-20 19:49:24  Acme Coporation  Hardware     12
  2015-01-26 01:50:25  Acme Coporation  Software     14
  2015-01-15 02:38:25  Acme Coporation   Service     16
  2015-01-06 13:47:37  Acme Coporatio

In [163]:
# Create an empty dictionary: month_dict
month_dict = {}
month_dict_grouped = {}

for month_name, month_data in month_list:

    # month_data: month_dict[month_name]
    month_dict[month_name] = month_data
    
    # Group month_data: month_dict_grouped[month_name]
    month_dict_grouped[month_name] = month_data.groupby("Company").sum()

In [165]:
# Concatenate data in month_dict: sales
sales = pd.concat(month_dict)

# Print sales
sales

Unnamed: 0_level_0,Unnamed: 1_level_0,Company,Product,Units
Unnamed: 0_level_1,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
february,2015-02-26 08:57:45,Streeplex,Service,4
february,2015-02-16 12:09:19,Hooli,Software,10
february,2015-02-03 14:14:18,Initech,Software,13
february,2015-02-02 08:33:01,Hooli,Software,3
february,2015-02-25 00:29:00,Initech,Service,10
february,2015-02-05 01:53:06,Acme Coporation,Software,19
february,2015-02-09 08:57:30,Streeplex,Service,19
february,2015-02-11 20:03:08,Initech,Software,7
february,2015-02-04 21:52:45,Acme Coporation,Hardware,14
february,2015-02-09 13:09:55,Mediacore,Software,7


In [167]:
# Concatenate data in month_dict_grouped: sales_grouped
sales_grouped = pd.concat(month_dict_grouped)

# Print sales_grouped
sales_grouped

Unnamed: 0_level_0,Unnamed: 1_level_0,Units
Unnamed: 0_level_1,Company,Unnamed: 2_level_1
february,Acme Coporation,34
february,Hooli,30
february,Initech,30
february,Mediacore,45
february,Streeplex,37
january,Acme Coporation,76
january,Hooli,70
january,Initech,37
january,Mediacore,15
january,Streeplex,50


In [157]:
# Print all sales by Mediacore
idx = pd.IndexSlice
sales.loc[idx[:, 'Mediacore'], :]

Unnamed: 0_level_0,Unnamed: 1_level_0,Units
Unnamed: 0_level_1,Company,Unnamed: 2_level_1
february,Mediacore,45
january,Mediacore,15
march,Mediacore,68


Well done! Now that you've mastered of the basics of concatenating your data, it's time to learn about different types of joins!



## 10. Concatenating DataFrames with inner join
Here, you'll continue working with DataFrames compiled from The Guardian's Olympic medal dataset.

Your task is to compute an inner join.

In [208]:
# Loading in the data for this exercise
file_path = "data/Summer Olympic medals/"
bronze = pd.read_csv(file_path + "bronze.csv", usecols=(1,2), index_col = "Country").iloc[0:5, :]
silver = pd.read_csv(file_path + "silver.csv", usecols=(1,2), index_col = "Country").iloc[2:8, :]
gold = pd.read_csv(file_path + "gold.csv", usecols=(1,2), index_col = "Country").iloc[1:6, :]

In [214]:
# Create the list of DataFrames: medal_list
medal_list = [bronze, silver, gold]

# Print medal_list
print(medal_list[0],"\n\n", medal_list[1],"\n\n", medal_list[2])

                 Total
Country               
United States   1052.0
Soviet Union     584.0
United Kingdom   505.0
France           475.0
Germany          454.0 

                 Total
Country              
United Kingdom  591.0
France          461.0
Germany         350.0
Australia       369.0
Italy           394.0
Hungary         308.0 

                 Total
Country              
Soviet Union    838.0
United Kingdom  498.0
France          378.0
Germany         407.0
Australia       293.0


In [215]:
# Concatenate medal_list horizontally using an inner join: medals
medals = pd.concat(medal_list, keys=["bronze", "silver", "gold"], axis = 1, join = "inner")

# Print medals
medals

Unnamed: 0_level_0,bronze,silver,gold
Unnamed: 0_level_1,Total,Total,Total
Country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
United Kingdom,505.0,591.0,498.0
France,475.0,461.0,378.0
Germany,454.0,350.0,407.0


Well done! France, Italy, and Germany got dropped as part of the join since they are not present in each of bronze, silver, and gold. Therefore, the final DataFrame has only the United States, Soviet Union, and United Kingdom.



## 11. Resampling & concatenating DataFrames with inner join
In this exercise, you'll compare the historical 10-year GDP (Gross Domestic Product) growth in the US and in China. The data for the US starts in 1947 and is recorded quarterly; by contrast, the data for China starts in 1961 and is recorded annually.

You'll need to use a combination of resampling and an inner join to align the index labels. You'll need an appropriate offset alias for resampling, and the method `.resample()` must be chained with some kind of aggregation method (`.pct_change()` and `.last()` in this case).

In [229]:
# Load in china GDP
china = pd.read_csv("data/GDP/gdp_china.csv", names = ["Year", "China"], 
                    header = 0, index_col = "Year", parse_dates = True)

# Print tail of china Dataframe
china.tail()

Unnamed: 0_level_0,China
Year,Unnamed: 1_level_1
2011-01-01,7492.432098
2012-01-01,8461.623163
2013-01-01,9490.6026
2014-01-01,10351.111762
2015-01-01,10866.443998


In [230]:
# Load in us GDP
us = pd.read_csv("data/GDP/gdp_usa.csv", names = ["Year", "US"], header= 0, 
                  index_col = "Year", parse_dates = True)

# Print tail of us Dataframe
us.tail()

Unnamed: 0_level_0,US
Year,Unnamed: 1_level_1
2015-04-01,17998.3
2015-07-01,18141.9
2015-10-01,18222.8
2016-01-01,18281.6
2016-04-01,18436.5


In [234]:
# Resample and tidy china: china_annual
china_annual = china.resample("A").last().pct_change(10).dropna()

# Print tail of china_annual
china_annual.tail()

Unnamed: 0_level_0,China
Year,Unnamed: 1_level_1
2011-12-31,4.623958
2012-12-31,4.788074
2013-12-31,4.752129
2014-12-31,4.330828
2015-12-31,3.789936


In [235]:
# Resample and tidy us: us_annual
# Chain .pct_change(10) as an aggregation method to compute the percentage change with an offset of ten years.
us_annual = us.resample("A").last().pct_change(10).dropna()

# Print tail of us_annual
us_annual.tail()

Unnamed: 0_level_0,US
Year,Unnamed: 1_level_1
2012-12-31,0.467723
2013-12-31,0.438621
2014-12-31,0.408368
2015-12-31,0.36178
2016-12-31,0.310677


In [236]:
# Concatenate china_annual and us_annual: gdp
# Chain .pct_change(10) as an aggregation method to compute the percentage change with an offset of ten years.
gdp = pd.concat([china_annual, us_annual], join="inner", axis = 1)

# Resample gdp and print
gdp.resample('10A').last()

Unnamed: 0_level_0,China,US
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
1970-12-31,0.546128,1.017187
1980-12-31,1.072537,1.742556
1990-12-31,0.89282,1.012126
2000-12-31,2.357522,0.738632
2010-12-31,4.011081,0.454332
2020-12-31,3.789936,0.36178


Great work! It looks like the 10 year GDP growth of China has been higher than the US since the 1990s.

