# Data Analysis in Python Workshop
André Guerra, andre.guerra@mail.mcgill.ca \
April, 2021 \
<u>Desciption:</u> This workshop focuses on data analysis techniques using python.

This notebook in the series examines plotting techniques to visualize the data using two popular packages - plotly and matplotlib.

___
## Jupyter notebooks
Some useful shortcuts to use in Jupyter notebooks.

When outside a cell:
- <b>A</b>: insert a cell above the current
- <b>B</b>: insert a cell below the current
- <b>D,D</b>: delete the current cell
- <b>M</b>: make the current cell markdown type
- <b>Y</b>: make the current cell code type

When inside a cell:
- <b>shift + enter</b>: execute/run cell
- <b>cmd + /</b>: comment/uncomment a line of code

___
## Import statement(s)
Import the package(s) and assign it to a local variable(s) for use in our code.

In [3]:
import pandas as pd
pd.set_option("display.precision",3)
import matplotlib.pyplot as plt
import plotly.express as px

### Read in the data from file

In [15]:
heart_file = "1_data/heart.csv"
heart_DF = pd.read_csv(heart_file)

covid19_file = "1_data/covid_19_data.xlsx"
covid_DF = pd.read_excel(covid19_file)

### Visualize the dataframe

In [5]:
heart_DF.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


## Plotly - 3D scatter plot

In [6]:
f = px.scatter_3d(heart_DF,
                  x='age',
                  y='trestbps',
                  z='chol',
                  color='sex',
                  hover_data=['target'])

### Save the figure as an .html file

In [7]:
f.write_html('4_figures/heart_DF_plot3d.html')

In [8]:
covid_DF.head()

Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
0,1,01/22/2020,Anhui,Mainland China,1/22/2020 17:00,1,0,0
1,2,01/22/2020,Beijing,Mainland China,1/22/2020 17:00,14,0,0
2,3,01/22/2020,Chongqing,Mainland China,1/22/2020 17:00,6,0,0
3,4,01/22/2020,Fujian,Mainland China,1/22/2020 17:00,1,0,0
4,5,01/22/2020,Gansu,Mainland China,1/22/2020 17:00,0,0,0


In [9]:
gdp_df = px.data.gapminder()
gdp_df

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
0,Afghanistan,Asia,1952,28.801,8425333,779.445,AFG,4
1,Afghanistan,Asia,1957,30.332,9240934,820.853,AFG,4
2,Afghanistan,Asia,1962,31.997,10267083,853.101,AFG,4
3,Afghanistan,Asia,1967,34.020,11537966,836.197,AFG,4
4,Afghanistan,Asia,1972,36.088,13079460,739.981,AFG,4
...,...,...,...,...,...,...,...,...
1699,Zimbabwe,Africa,1987,62.351,9216418,706.157,ZWE,716
1700,Zimbabwe,Africa,1992,60.377,10704340,693.421,ZWE,716
1701,Zimbabwe,Africa,1997,46.809,11404948,792.450,ZWE,716
1702,Zimbabwe,Africa,2002,39.989,11926563,672.039,ZWE,716


In [24]:
f = px.scatter(gdp_df.query("year==2007"), x="gdpPercap", y="lifeExp",
                 size="pop", color="continent",
                 hover_name="country", log_x=True, size_max=60)

f.write_html('4_figures/GDP_data.html')

In [11]:
elec_df = px.data.election()
elec_df.head()

Unnamed: 0,district,Coderre,Bergeron,Joly,total,winner,result,district_id
0,101-Bois-de-Liesse,2481,1829,3024,7334,Joly,plurality,101
1,102-Cap-Saint-Jacques,2525,1163,2675,6363,Joly,plurality,102
2,11-Sault-au-Récollet,3348,2770,2532,8650,Coderre,plurality,11
3,111-Mile-End,1734,4782,2514,9030,Bergeron,majority,111
4,112-DeLorimier,1770,5933,3044,10747,Bergeron,majority,112


In [12]:
tips_df = px.data.tips()
tips_df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [25]:
f = px.scatter(tips_df,x="total_bill",y="tip",
               size="size",color="time")

f.write_html('4_figures/tips_data.html')

In [21]:
cars_df = px.data.carshare()
cars_df.head()

Unnamed: 0,centroid_lat,centroid_lon,car_hours,peak_hour
0,45.472,-73.589,1772.75,2
1,45.544,-73.562,986.333,23
2,45.488,-73.643,354.75,20
3,45.523,-73.596,560.167,23
4,45.454,-73.739,2836.667,19


In [27]:
f = px.scatter_mapbox(cars_df, lat="centroid_lat", lon="centroid_lon", color="peak_hour", size="car_hours",
                  color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10,
                  mapbox_style="carto-positron")

f.write_html('4_figures/car_share.html')