# Assignment 1
**Emma McCready**

---

### Analysis of population over time

You are required to collect, process, analyse and interpret the data in order to identify possible issues/problems at present and make predictions/classifications in regard to the future. This analysis will rely on the available data from CSO and any additional data you deem necessary (with supporting evidence) to support your hypothesis for this scenario.

Areas of focus:
* Annual Population Change
* Population Forecasting

To do:

* Re organise this sheet into Data Cleaning, then Data prep (if this is the correct order?), then EDA... and so on
* Maybe write a function to automatically prepare the data for each type of migration?
* I'd rather if net migration included the thousands in it
* Investigate "All countries" under countries dataset

--- 

- diagnosing the “tidiness” of the data — how much data cleaning we will have to do
- reshaping the data — getting right rows and columns for effective analysis
- combining multiple files
- changing the types of values — how we fix a column where numerical values are stored as strings, for example
- dropping or filling missing values - how we deal with data that is incomplete or missing
- manipulating strings to represent the data better

https://www.cso.ie/en/releasesandpublications/ep/p-pme/populationandmigrationestimatesapril2023/keyfindings/

In [1]:
# Dependencies.. put any packages I install here:
#!pip install flask pandas==2.1.2

In [2]:
# Load in packages

import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt


<summary style="color:blue;">

# 1. Data Inspection
    
First, I need to load in the data, and then take a quick look at the data to get a sense of what I'm dealing with. I'm initially interested in what unique columns there are, the data types of the columns,

In [3]:
# Load in data

pop_data = pd.read_csv("migration_data.csv")

#printing the first 5 rows
pop_data.head()

Unnamed: 0,STATISTIC Label,Year,Country,Sex,Origin or Destination,UNIT,VALUE
0,Estimated Migration (Persons in April),1987,United Kingdom (1),Both sexes,Net migration,Thousand,-13.7
1,Estimated Migration (Persons in April),1987,United Kingdom (1),Both sexes,Emigrants: All destinations,Thousand,21.8
2,Estimated Migration (Persons in April),1987,United Kingdom (1),Both sexes,Immigrants: All origins,Thousand,8.1
3,Estimated Migration (Persons in April),1987,United Kingdom (1),Male,Net migration,Thousand,-9.0
4,Estimated Migration (Persons in April),1987,United Kingdom (1),Male,Emigrants: All destinations,Thousand,13.1


<summary style="color:blue;">

__Notes on the above:__

the dataframe is already in a good format, in that it has each variable as a separate column, and each row as a separate observation. So, no need for pd_melt() on first look. 
* The statistic label column seems a bit redundant though, and it would be more favourable to remove the unit column and instead have the VALUE column reflect it instead (ie. multiply it by 1000).
* Will need to confirm the varibales are stored as the correct data type.


<summary style="color:blue;">

### Getting a sense of the data...

By running `.info`, I can also check for null values:

In [4]:
pop_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2664 entries, 0 to 2663
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   STATISTIC Label        2664 non-null   object 
 1   Year                   2664 non-null   int64  
 2   Country                2664 non-null   object 
 3   Sex                    2664 non-null   object 
 4   Origin or Destination  2664 non-null   object 
 5   UNIT                   2664 non-null   object 
 6   VALUE                  2104 non-null   float64
dtypes: float64(1), int64(1), object(5)
memory usage: 145.8+ KB


<summary style="color:blue;">

__Notes on the above:__
    
* Shape: there's 2664 rows, and 7 columns in the dataset (no need to use `.shape`, as I now have this info);
* the "year" column and the "VALUE" column are the only numerical variables. While this is correct for VALUE, Year is technically a categorical variable, but is numeric so should be okay. I will see about amending data types this when tidying the data in Section 2 of this codebook;
* I see that there's missing values in the VALUE column (note the difference between the total number of rows (2664) and the count of non-null values for the VALUE column (2104), so for convenience/to save doing any maths, I'm just going to _run the following code to get the total number of missing values, and may as well check to ensure there's no duplicates while I'm at it:_

In [5]:
print(pop_data.isnull().sum())

STATISTIC Label            0
Year                       0
Country                    0
Sex                        0
Origin or Destination      0
UNIT                       0
VALUE                    560
dtype: int64


In [6]:
duplicates = pop_data.duplicated()
print(duplicates.value_counts())  

False    2664
Name: count, dtype: int64


<summary style="color:blue;">

###### Getting a closer look, with summaries for the numerical variables:

In [7]:
pop_data.describe()

Unnamed: 0,Year,VALUE
count,2664.0,2104.0
mean,2005.0,8.943726
std,10.679083,15.513703
min,1987.0,-43.9
25%,1996.0,1.8
50%,2005.0,4.7
75%,2014.0,10.2
max,2023.0,151.1


<summary style="color:blue;">

    
__Notes on the above:__
    
* From min/max of the Year col, I can see that the population data ranges from the years 1987 to 2023;
* Regarding the value column, I note from when I called `pop_data.head()` that this is variable contains information on the net migration, as well as the number of incoming and outgoing people all as individual observations. So, the statistical information extracted (ie. the mean, std, etc.) isn't reliable.

<summary style="color:blue;">
    
###### Summaries for the categorical variables:

In [8]:
pop_data.describe(include=object)

Unnamed: 0,STATISTIC Label,Country,Sex,Origin or Destination,UNIT
count,2664,2664,2664,2664,2664
unique,1,8,3,3,1
top,Estimated Migration (Persons in April),United Kingdom (1),Both sexes,Net migration,Thousand
freq,2664,333,888,888,2664


   
<summary style="color:blue;">

    
__Notes on the above:__
    
*  I note that "STATISTIC Label" and "UNIT" only have one unique value each and what they are ('Estimated Migration (Persons in April)' and 'Thousand' respectively). The former refers to this data being true for April each year, and the latter is referring to the "VALUE" column having the units "thousand", (i.e.a VALUE of 1 would be 1 thousand). I may decide to remove these columns when tidying the data as they feel a bit redundant, and capture the information they provide elsewhere.
* I'm curious about the modes for each of these, I assume they are just listed in as the mode as they are the top value in the dataset, but just for piece of mind:

In [9]:
print('Value counts for "Country":\n', pop_data['Country'].value_counts(),
      '\n\nValue counts for "Sex":\n', pop_data['Sex'].value_counts(),
      '\n\nValue counts for "UNIT":\n', pop_data['UNIT'].value_counts())


Value counts for "Country":
 Country
United Kingdom (1)                                     333
United States                                          333
Canada                                                 333
Australia                                              333
Other countries (23)                                   333
All countries                                          333
EU14 excl Irl (UK & Ireland)                           333
EU15 to EU27 (accession countries joined post 2004)    333
Name: count, dtype: int64 

Value counts for "Sex":
 Sex
Both sexes    888
Male          888
Female        888
Name: count, dtype: int64 

Value counts for "UNIT":
 UNIT
Thousand    2664
Name: count, dtype: int64


<summary style="color:blue;">

My initial instinct is correct, that these are just listed as the modes because they were the first value in the dataset.
    
    
I also wanted to confirm the values for the Sex and UNIT columns, and just added this into the above code block instead of running individual `.unique()` codeblocks for them so it's a bit tidier. I can forsee having summary data for "both sexes" which probably contains the sum of the values for male and female, as well as a potential summary data for "all countries" which is the sum of all the other countries. I will have to confirm that this is the case before I exclude them. Otherwise, they might impact the analysis. I think they will be useful in their own right, and I can use them to produce summary graphs, but for later analysis they are probably better removed.
    
    
I dislike these labels for countries - may have to amend and change to UK, US, Canada, Australia, Other, All countries, EU14 (Excl UK, IE), EU 15-27. I will do this as part of my attempt to make the data easier to work with.

<summary style="color:blue;">

# 2. Data Tidying
Ideas for tidying:
* make some things lower case?
* Drop the unnecessary "Statistic Label" column
* rename cols, e.g. instead of "Origin or Destination" change it to migration_type
* Change the strings under the "Country" column
    


<summary style="color:blue;">

###### 2.1 Dropping "Statistic Label" and "UNIT" columns 
Motivation for removing the "Statistic Label" is that it's providing no purpose in the analysis that will be conducted on this dataframe. I will instead capture the information that this column is providing (i.e. that the values are true for April each year), in the output/report.
    
    
I'm also removing the "UNIT" column, as I would rather include this in the "VALUE" column (by multiplying the values by a thousand)
    
    
Doing this should make the dataframe less clunky and easier to explore
    

In [10]:
pop_data = pop_data.drop(columns = ['STATISTIC Label', 'UNIT'])

# Making sure it was a success:
pop_data.head()

Unnamed: 0,Year,Country,Sex,Origin or Destination,VALUE
0,1987,United Kingdom (1),Both sexes,Net migration,-13.7
1,1987,United Kingdom (1),Both sexes,Emigrants: All destinations,21.8
2,1987,United Kingdom (1),Both sexes,Immigrants: All origins,8.1
3,1987,United Kingdom (1),Male,Net migration,-9.0
4,1987,United Kingdom (1),Male,Emigrants: All destinations,13.1


<summary style="color:blue;">

###### 2.2 Renaming the column headings, 
This is just to make it easier to write code, and also to make headings reflect the data their respective column contains more accurately. I also want to make them all lowercase so it's more convenient to type.

In [11]:
pop_data = pop_data.rename(columns = {'Year':'year', 
                                      'Country':'country', 
                                      'Sex':'sex', 
                                      'Origin or Destination':'migration_type', 
                                      'VALUE':'total_migration'})
pop_data.columns

Index(['year', 'country', 'sex', 'migration_type', 'total_migration'], dtype='object')

<summary style="color:blue;">

###### 2.3 Fixing data in the 'total_migration' column
As I removed the "UNIT" column, which indicated that the number in the now-'total_migration' column is in thousands, I now need to multiply the column by 1000 to reflect the thousands.

In [12]:
pop_data['total_migration'] = (pop_data['total_migration'] * 1000)
pop_data.head()

Unnamed: 0,year,country,sex,migration_type,total_migration
0,1987,United Kingdom (1),Both sexes,Net migration,-13700.0
1,1987,United Kingdom (1),Both sexes,Emigrants: All destinations,21800.0
2,1987,United Kingdom (1),Both sexes,Immigrants: All origins,8100.0
3,1987,United Kingdom (1),Male,Net migration,-9000.0
4,1987,United Kingdom (1),Male,Emigrants: All destinations,13100.0


<summary style="color:blue;">

###### 2.4 Shortening the values under "country"
I want to make them a bit shorter and hopefully easier to work with. As there are widely known and well-defined abbreviations for some of the countries (e.g. UK for United Kingdom),  I think it's appropriate.
    
First, I just want to have the country values on-hand. Then, I'll change the strings. I decided to do this manually as I was struggling to understand the explanations on stackoverflow. I called `.unique()` and it's useful to see the change in output for the next two code blocks:

In [13]:
pop_data['country'].unique()

array(['United Kingdom (1)', 'United States', 'Canada', 'Australia',
       'Other countries (23)', 'All countries',
       'EU14 excl Irl (UK & Ireland)',
       'EU15 to EU27 (accession countries joined post 2004)'],
      dtype=object)

In [14]:
#Countries = pd.DataFrame[{'country':['UK', 'US', 'Canada', 'Australia', 'Other Countries', ] }

pop_data['country'] = pop_data['country'].str.replace('United Kingdom (1)', 'UK')
pop_data['country'] = pop_data['country'].str.replace('United States', 'US')
pop_data['country'] = pop_data['country'].str.replace('Other countries (23)', 'Other countries')
pop_data['country'] = pop_data['country'].str.replace('EU14 excl Irl (UK & Ireland)', 'EU14 (Excl UK, IRE)') # UK no longer in EU but it would've been for a lot of the colleciton of this data
pop_data['country'] = pop_data['country'].str.replace('EU15 to EU27 (accession countries joined post 2004)', 'EU15 to EU27')

pop_data['country'].unique()

array(['UK', 'US', 'Canada', 'Australia', 'Other countries',
       'All countries', 'EU14 (Excl UK, IRE)', 'EU15 to EU27'],
      dtype=object)

<summary style="color:blue;">

###### 2.4 Changing the values under "migration_type"

Same as above. I'd like to shorten them to make it easier to call.

In [43]:
pop_data['migration_type'].unique()

['Net migration', 'Emigrants: All destinations', 'Immigrants: All origins']
Categories (3, object): ['Emigrants: All destinations', 'Immigrants: All origins', 'Net migration']

In [46]:
pop_data['migration_type'] = pop_data['migration_type'].str.replace('Emigrants: All destinations', 'Emigration')
pop_data['migration_type'] = pop_data['migration_type'].str.replace('Immigrants: All origins', 'Immigration')

pop_data['migration_type'].unique()

array(['Net migration', 'Emigration', 'Immigration'], dtype=object)

<summary style="color:blue;">

###### 2.4 Amending data types

The year, country, sex and migration type are categorical, as it's a finite list, so should amend that, and confirm using .info:

In [48]:
#pop_data['country'] = pop_data['country'].astype('category')
#pop_data['sex'] = pop_data['sex'].astype('category')
pop_data['migration_type'] = pop_data['migration_type'].astype('category')
#pop_data['year'] = pop_data['year'].astype('category')


pop_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2664 entries, 0 to 2663
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   year             2664 non-null   int64   
 1   country          2664 non-null   object  
 2   sex              2664 non-null   object  
 3   migration_type   2664 non-null   category
 4   total_migration  2104 non-null   float64 
dtypes: category(1), float64(1), int64(1), object(2)
memory usage: 86.1+ KB


<summary style="color:blue;">

###### 2.5 Missing Values

Probably the most important aspect - trying to deal with the missing values. This data is likely missing due to systematic causes, i.e. the data was never provided in the first place. This data is missing at random (MAR), as there's no real logic or explanation other than it's generally older data that seems to be missing.
* For this reason, it might be appropriate to delete all of the rows containing empty values, OR to limit the range of the analysis (e.g. limit it from 2010 -> present)
* It could be considered to try to insert the mean of the values above and below each missing value. Will have to investigate and decide

<summary style="color:blue;">

# 3. Continuing inspection    
    


<summary style="color:blue;">

###### 3.1 Looking at the values I suspect are the sum of the other values in their respective columns:

###### 3.1.1 "All countries"    
    
Concerned about "All countries" under the Country column, is it a total? 
First, I created a new dataframe called `checking_pop_data` to isolate the data for a particular year, choosing a recent year as the data is most likely to have no missing values. But, I still ran `.info()` to confirm.     
    

<details>
    <summary style="display:list-item; font-size:16px; color:blue;"><i>Failed code, for my own reference, so I don't make the same silly mistakes</i></summary>
Below are all my failed attempts at trying to isolate the column - I realised I was overthinking it and went back to basics (in the code block below). All of these obviously failed as I wasn't sure what I was doing and was overcomplicating. 


`print((pop_data['Year'] == 2020))`
      

`for year in pop_data['Year']:`\
    `if year == 2020:`\
    `print(pop_data['Country'], pop_data['VALUE'])`
            
`pop_data.loc[2020:2021, 'Country':'VALUE']`
    
`pop_data.loc[pop_data['Year' == 2020]]`
    
`pop_data.loc['Year', 2020]`
    


</details>

In [16]:
checking_pop_data = pop_data[pop_data['year'] == 2020]
checking_pop_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 72 entries, 2376 to 2447
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   year             72 non-null     int64   
 1   country          72 non-null     object  
 2   sex              72 non-null     object  
 3   migration_type   72 non-null     category
 4   total_migration  72 non-null     float64 
dtypes: category(1), float64(1), int64(1), object(2)
memory usage: 3.0+ KB


<summary style="color:blue;">
From the above, I'm happy that there's no missing values, and so I can proceed to make sure check the totals for "All countries" vs the total of the other countries.
To do this, I'll loop through each country to sum the total migration value, and to compare it to the value for "all countries":

In [17]:
Countries = ['UK', 'US', 'Canada', 'Australia', 'Other countries', 'EU14 (Excl UK, IRE)', 'EU15 to EU27']

sum_countries = 0
sum_all_countries = 0


for index, row in checking_pop_data.iterrows():
    if row['country'] in Countries:
        sum_countries += row['total_migration']
    elif row['country'] == 'All countries':
        sum_all_countries += row['total_migration']
        
print("Sum for selected countries:", sum_countries)
print("Sum for 'all countries':", sum_all_countries)

Sum for selected countries: 382400.0
Sum for 'all countries': 382400.0


<summary style="color:blue;">

I'm satisfied that this confirms that the data for "All countries" is the sum of all of the other countries in the dataset.
    
To explain the loop I wrote, I first defined two variables to store the sum of the total_migration values for both "All countries" and all of the other countries. Then, I used `.iterrows()` from pandas to iterate through each row. I was inspired by the answer to <a href="https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas">this post</a>  on stackoverflow, and added my own code after it.

<summary style="color:blue;">

###### 3.1.2 "Both sexes" 

The same as above, but I'm checking to see if "Both sexes" is the sum of the "Male" and "Female" migration count. (Safe to assume yes, but better safe than sorry). I'm just going to adapt the code above:

In [18]:
sexes = ['Male', 'Female']

individual_sexes = 0
both_sexes = 0


for index, row in checking_pop_data.iterrows():
    if row['sex'] in sexes:
        individual_sexes += row['total_migration']
    elif row['sex'] == 'Both sexes':
        both_sexes += row['total_migration']
        
print("Sum for individual sexes:", individual_sexes)
print("Sum for both sexes:", both_sexes)

Sum for individual sexes: 382400.0
Sum for both sexes: 382400.0


<summary style="color:blue;">
    
Although I could've done this by simply looking at the dataset, as there's fewer rows to count. I'm going to manually take the UK net migration values for Both sexes, male, and female.

In [19]:
checking_pop_data.head(10)

Unnamed: 0,year,country,sex,migration_type,total_migration
2376,2020,UK,Both sexes,Net migration,7900.0
2377,2020,UK,Both sexes,Emigrants: All destinations,9700.0
2378,2020,UK,Both sexes,Immigrants: All origins,17600.0
2379,2020,UK,Male,Net migration,3900.0
2380,2020,UK,Male,Emigrants: All destinations,4500.0
2381,2020,UK,Male,Immigrants: All origins,8400.0
2382,2020,UK,Female,Net migration,4000.0
2383,2020,UK,Female,Emigrants: All destinations,5200.0
2384,2020,UK,Female,Immigrants: All origins,9200.0
2385,2020,US,Both sexes,Net migration,700.0


In [20]:
individual_sexes = 3900 + 4000
both_sexes = 7900

print("Sum for individual sexes:", individual_sexes)
print("Sum for both sexes:", both_sexes)

Sum for individual sexes: 7900
Sum for both sexes: 7900


<summary style="color:blue;">

Double confirmation that both sexes is the sum of the values for male and female. So, I'm satisfied!

<summary style="color:blue;">

# 4. Data Preparation and Visualisation 

<summary style="color:blue;">

###### 4.1 Looking at migration vs sex

I was doing this manually in the subsections below, but I was repeating a lot of code so I decided to attempt to write a function to clean it up a bit. I left _Section 4.1.1_ as it is so I can compare the output from my function to it, and then I can speed up the rest of the comparisons.

In [56]:
def filter_migration_by_sex_and_type(dataframe, sex_to_filter, migration_type_to_filter):
    specific_sex = (pop_data["sex"] == sex_to_filter)
    specific_migration_type = (pop_data["migration_type"] == migration_type_to_filter)
    filtered_data = (pop_data[specific_sex & specific_migration_type])
    return filtered_data

### Note to self: change the name of the function and the output

In [55]:
# Example usage to filter for "Both sexes" and "Emigrant":
filtered_data = filter_migration_by_sex_and_type(pop_data, sex_to_filter="Both sexes", migration_type_to_filter="Emigration")
filtered_data.head()

Unnamed: 0,year,country,sex,migration_type,total_migration
1,1987,UK,Both sexes,Emigration,21800.0
10,1987,US,Both sexes,Emigration,9900.0
19,1987,Canada,Both sexes,Emigration,
28,1987,Australia,Both sexes,Emigration,
37,1987,Other countries,Both sexes,Emigration,5400.0


<summary style="color:blue;">

###### 4.1.1 Looking at Net Migration for Both Sexes

Initial thoughts and overview: This dataset is a count of the total number of emigrants and immigrants in any given year, to and from a given county. It's a bit inconvenient to have the net migration plus the number of immigrants and emigrants in the same column I think, but I should be able to find a work around without making additional unnecessary columns. But first, I want to make use of the three types of `migration_type` and visualise them.
    
First, I'll focus on "Net migration": Extract all instances of net migration and move it into a new dataframe. Make sure to bring all other cols with it. I've been learning Python on Codecademy.com, and one of the conceps covered was boolean masks, so I made use for this task. These work by assigning each row with a true or false based on whether or not a condition is true (i.e. in this case, if the data refers to net migration, and then for both sexes). Then I can create a new dataframe with this info.

In [40]:
net_migration = (pop_data["migration_type"] == "Net migration")  
both_sexes = (pop_data["sex"] == "Both sexes") 
countries_for_inclusion = (pop_data["country"] != "All countries") 
net_migration_both_sexes = pop_data[net_migration & both_sexes & countries_for_inclusion]

net_migration_both_sexes.head()

Unnamed: 0,year,country,sex,migration_type,total_migration
0,1987,UK,Both sexes,Net migration,-13700.0
9,1987,US,Both sexes,Net migration,-6900.0
18,1987,Canada,Both sexes,Net migration,
27,1987,Australia,Both sexes,Net migration,
36,1987,Other countries,Both sexes,Net migration,-1400.0


<summary style="color:blue;"> 
    
I want to make sure the countries are correct;   

In [42]:
net_migration_both_sexes['country'].unique()

array(['UK', 'US', 'Canada', 'Australia', 'Other countries',
       'EU14 (Excl UK, IRE)', 'EU15 to EU27'], dtype=object)

<summary style="color:blue;"> 
    
I'm happy they are, so I can proceed with having a look at this new dataset, which only contains the net migration total for each country (excluding the sum of all countries) for both sexes.

In [36]:
net_migration["total_migration"].describe()

count      199.000000
mean      3148.743719
std      10932.233219
min     -34200.000000
25%      -1200.000000
50%       1200.000000
75%       5700.000000
max      64900.000000
Name: total_migration, dtype: float64

In [37]:
# up to date look at the data..
net_migration.info()

<class 'pandas.core.frame.DataFrame'>
Index: 259 entries, 0 to 2655
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   year             259 non-null    int64   
 1   country          259 non-null    object  
 2   sex              259 non-null    object  
 3   migration_type   259 non-null    category
 4   total_migration  199 non-null    float64 
dtypes: category(1), float64(1), int64(1), object(2)
memory usage: 10.5+ KB


Now the statistical info for the migration will be much more meaningful;

In [25]:
net_migration["total_migration"].describe()

count       236.000000
mean       5311.016949
std       17635.191046
min      -43900.000000
25%       -1400.000000
50%        1550.000000
75%        7250.000000
max      104800.000000
Name: total_migration, dtype: float64

In [26]:
# Should expect to see only statistics for the total column now, and not year anymore:
pop_data.describe()

Unnamed: 0,year,total_migration
count,2664.0,2104.0
mean,2005.0,8943.726236
std,10.679083,15513.702887
min,1987.0,-43900.0
25%,1996.0,1800.0
50%,2005.0,4700.0
75%,2014.0,10200.0
max,2023.0,151100.0


<summary style="color:blue;">

###### 4.1.2 Looking at Net Migration for Individual sexes

I need to ensure I get rid of the "All countries" data as well. Same approach as above in Section 4.1.1, but I've already defined the `countries_for_inclusion` variable, so no need to so that again.

In [38]:
net_migration_sex = (pop_data["migration_type"] == "Net migration")  
individual_sexes = (pop_data["sex"] == "Male") | (pop_data["sex"] == "Female")

net_migration_by_sex = pop_data[net_migration_sex & individual_sexes & countries_for_inclusion]

net_migration_by_sex.head()

Unnamed: 0,year,country,sex,migration_type,total_migration
3,1987,UK,Male,Net migration,-9000.0
6,1987,UK,Female,Net migration,-4700.0
12,1987,US,Male,Net migration,-3600.0
15,1987,US,Female,Net migration,-3500.0
21,1987,Canada,Male,Net migration,


In [31]:
#first_graph = net_migration[['year','country', 'total_migration']]

#net_migration = sns.catplot(data=first_graph, kind="bar", x="year", y="country", height = 6, aspect = 2)

TypeError: 'FacetGrid' object is not subscriptable

In [28]:
#attempt_2 = sns.boxplot(x="country", y="total_migration", data=net_migration)