<a href="https://colab.research.google.com/github/Colsai/DATA-690-WANG/blob/master/hw13/JET_Program_CO_Origin.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Short Look at The JET Program Participant Countries
## By Participants and Countries of Origin
## Focusing on Plotly Express Visualizations
![JET Program](https://www.nz.emb-japan.go.jp/images/Jet_logo_small.jpg)

## Introduction: The JET Program 
The JET program is an international Japanese teaching exchange program, where those interested in Japan can find teaching placements in Japanese primary and secondary schools as English language assistants. 



## Notebook Focus:
I will practice a few new types of visualizations for this homework.
I would like to create
- A pie chart
- A world heatmap
- Some other visualizations, if possible.

## 1. Import Packages for this Analysis
We will just focus on using pandas and plotly.express for this notebook, since this is a mini-notebook for focusing on specific things. 

In [46]:
#Import Packages for this short analysis
import pandas as pd
import plotly.express as px

In [47]:
#Update Plotly So That we can use some of the newer graphs
!pip install --upgrade plotly

Requirement already up-to-date: plotly in /usr/local/lib/python3.6/dist-packages (4.12.0)


## 2. Web Scrape from the website:   
There are some statistics about country participants on this website here:
http://jetprogramme.org/en/countries/

In [48]:
#Run the Web Scrape on the website
site = 'http://jetprogramme.org/en/countries/'
df = pd.read_html(site)

In [49]:
#Set the data frame being used to the first table in the dataframe
df_jet = df[0]

## 3. Let's look at the head/tail/sample of the data for this data set.


In [50]:
#Let's look at the first 5 countries
df_jet.head(5)

Unnamed: 0,Country,ALT,CIR,SEA,Total
0,United States,2958,145,2,3105
1,United Kingdom,528,32,0,560
2,Australia,321,22,0,343
3,New Zealand,236,12,3,251
4,Canada,531,26,0,557


In [51]:
#Let's look at the last 5 countries
df_jet.tail(20)

Unnamed: 0,Country,ALT,CIR,SEA,Total
44,Kingdom of Tonga,1,0,0,1
45,Vietnam,0,7,0,7
46,Saint Vincent and the Grenadines,2,0,0,2
47,Uzbekistan,0,2,0,2
48,Seychelles,1,0,0,1
49,Croatia,0,1,0,1
50,United Republic of Tanzania,0,0,1,1
51,Republic of Malta,2,0,0,2
52,Republic of Estonia,4,0,0,4
53,Republic of Lithuania,0,2,0,2


In [52]:
#Let's Sample 15 random countries here
df_jet.sample(15)

Unnamed: 0,Country,ALT,CIR,SEA,Total
45,Vietnam,0,7,0,7
6,France,4,23,0,27
51,Republic of Malta,2,0,0,2
2,Australia,321,22,0,343
40,Kingdom of Sweden,2,0,0,2
17,Argentina,0,1,0,1
18,Belgium,0,1,0,1
38,Trinidad and Tobago,61,0,0,61
14,Italy,0,2,0,2
42,Latvia,0,2,0,2


## Fix some of the data
One of the issues here that we saw in the tail was that we have data from the last 5 years. This data is different from the data we have from our countries in the first ~50 rows. There are also two rows that have a total and totals by years. I'll segment this into three dataframes. 

In [53]:
#We will keep the df_jet dataframe, but create 3 different dataframes out of it.

#Break the dataframe into two distinct sets of data
df_countries = df_jet[:-7] #This is the data until the last seventh row, the table

df_totals = df_jet[-7:-5] #This is the data from the last seventh to last fifth

df_years = df_jet[-5:] #This is the last 5 rows of the data

## Let's look at the new dataframes:


In [54]:
#Let's look at the tail of the data for the new dataset. Does it catch any incorrect values?
df_countries.tail(5)

Unnamed: 0,Country,ALT,CIR,SEA,Total
52,Republic of Estonia,4,0,0,4
53,Republic of Lithuania,0,2,0,2
54,Federal Democratic Republic of Ethiopia,0,0,1,1
55,Republic of the Union of Myanmar,0,1,0,1
56,Republic of Chile,0,1,0,1


In [55]:
df_totals.head(5) #It shows a breakdown of all countries and years by country

Unnamed: 0,Country,ALT,CIR,SEA,Total
57,Total Participants from All Countries,5234,514,13,5761
58,Totals By Years of Programme,ALT,CIR,SEA,Total


## Visualizations with Plotly.Express
Let's make a pie chart with the participants

In [56]:
fig = px.bar(df_years, 
             x='Country', 
             y='ALT',
             title = "JET Participants By Year (1st to 5th Years)",
             hover_data=['Total','CIR','SEA'])

fig.update_layout(title_font_size = 20,)

fig.show()

In [57]:
#Organize by year, drop the original index values from
df_years = df_years.sort_values(by="Country", ignore_index=True)

In [58]:
df_years

Unnamed: 0,Country,ALT,CIR,SEA,Total
0,1st Year,1885,203,3,2091
1,2nd Year,1602,138,6,1746
2,3rd Year,867,101,3,971
3,4th Year,532,53,1,586
4,5th Year,348,19,0,367


In [59]:
#Let's just take the first four columns. 'Total' doesn't really serve any purpose here.
df_years = df_years[['Country','ALT','CIR','SEA']]

## Line Graph of Particpants over years

In [60]:
#Let's try to make a stacked bar graph here:
#long_df = px.df_years()

fig = px.line(df_years, 
             x='Country', 
             y=["ALT", "CIR", "SEA"], 
             title="Graph of JET Participants, by Year (1st to 5th)",
              width = 600,
              height = 400.
             )

#Show the figure
fig.show()

In [61]:
#Let's try to make a stacked bar graph here:
#long_df = px.df_years()

fig = px.bar(df_years, 
             x='Country', 
             y=["ALT", "CIR", "SEA"], 
             title="Graph of JET Participants, by Year",
             hover_data=['ALT'],
             barmode = 'stack',
             )


#Show the figure
fig.show()

In [62]:
#Where are JET Participants From?
n = 10

fig = px.pie(df_countries.head(n), 
             values='Total', 
             names='Country', 
             labels = 'Country', 
             title=f'Jet Participants by top-{n} Countries',
             width=900,
             height=900)

fig.show()

Let's try to make a graphic that captures more details of the elements of the data here:

In [63]:
#Not a Particularly useful graphic, but it definitely looks cool
top_number = 80

fig = px.sunburst(df_countries.head, 
                  path=['SEA', 'CIR', 'ALT', 'Country'], 
                  values='Total',
                  color='Total', 
                  hover_data=['Country'],
                  title=f"Graph of Top-{top_number} Participant Countries in the JET Program",
                  width=900, 
                  height=900)

fig.update_layout(
    font_family="Pontano Sans",
    font_size = 16,
    font_color="red",
    title_font_family="Pontano Sans",
    title_font_color="black",
    title_font_size = 30,
)

#Show the Fig
fig.show()

ValueError: ignored

## Next, let's take a look at: