# Interactive Data Exploration

## Objectives

* learn how to use jupyter notebooks
* learn data exploration with ```pandas``` and ```plotly```
* lear to import data sets from represitories such as [kaggle.com](https://www.kaggle.com/datasets)  
* discover anthroplogical impact on the enviroment
 

### Package Import

In this notebook the ```plotly```-packackage for interactive plots is used. Auxiallary standard packages are the ```numpy```, ```pandas``` and ```scipy```

In [None]:
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import scipy as sci
import time
from skimage import io

load from the ```plotly.express```-package the dataset ```gapminder``` as the vairable ```df``` (dataframe). 

display some data of ```df1``` with:

```df.head()```
```df.tail()```



In [None]:
df1 = px.data.gapminder();
df1.sample(5)

## exercise
check the shape of the dataset ```df1```by calling it with ```np.shape()```


run cell for solution

In [None]:
# %load solutions/solution_01.py
np.shape(df)

gather information about India, Germany, etc.

In [None]:
df1[df1.country == "India"]

run cell for solution

In [None]:
# %load solutions/solution_02.py
df1[df1.country == "Germany"]

## Data Vizualization with ```plotly```
### Bubble Scatter Chart

for more information refer to the package documentation

In [None]:
fig = px.scatter(df1, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country",
           log_x=True, size_max=55, range_x=[100,100000], range_y=[25,90]);

fig["layout"].pop("updatemenus"); # optional, drop animation buttons
fig.show();

In [None]:
fig = px.scatter(df1, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country", facet_col="continent",
           log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.show()

### export to html 

In [None]:
fig.write_html("figure.html")

## Data Vizualization on Maps


In [None]:
fig = px.scatter_geo(df1, locations="iso_alpha", color="continent",
                     hover_name="country", size="pop",
                     animation_frame="year",
                     projection="natural earth")
fig.show()

### load different dataset

In [None]:
df2=px.data.carshare()
df2.head()

In [None]:
fig = px.scatter_mapbox(df2,lat="centroid_lat",lon="centroid_lon",color="peak_hour",size="car_hours",
                        color_continuous_scale=px.colors.cyclical.IceFire,size_max=15,zoom=10,
                        mapbox_style="carto-positron")

fig.show()

## Heatmaps

In [None]:
df3 = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')
fig = px.density_mapbox(df3, lat='Latitude', lon='Longitude', z='Magnitude', radius=10,
                        center=dict(lat=0, lon=180), zoom=0,
                        mapbox_style="stamen-terrain")
fig.show()

## Load custom Dataset from Kaggle

be sure to have an account and an API-licence and install the ```kaggle```-package via pip: 

```python3 -m pip install kaggle```


1. step create a hidden folder named kaggle in the working directory (doesn't work on Windows-OS, UNIX based commands)
2. copy ```.json```-file in the created dir (API-key)
3. change permissions on ```~/.kaggle/kaggle.json``` that file

In [None]:
! mkdir ~/.kaggle

In [None]:
! cp kaggle.json ~/.kaggle/

In [None]:
! chmod 600 ~/.kaggle/kaggle.json

import the dataset ```daily-air-quality-dataset-india```


In [None]:
! /Users/christian/Library/Python/3.8/bin/kaggle datasets download sumandey/daily-air-quality-dataset-india

In [None]:
!ls

In [None]:
! unzip daily-air-quality-dataset-india.zip

In [None]:
df4 = pd.read_csv("air_quality_index.csv")

In [None]:
df4.head()

In [None]:
df4.dtypes

In [None]:
df4['DATE'] = pd.DatetimeIndex(df4['DATE'])

df4['YEAR-MONTH'] = df4['DATE'].dt.strftime('%Y-%m')

df4.dtypes

## Sort Dataframe by Country

In [None]:
#Getting Country as India
data_INDIA = df4.loc[df4['COUNTRY'] == 'IN']
#Getting Country as US
data_USA = df4.loc[df4['COUNTRY'] == 'US']

## Visualize Data for India 

In [None]:
fig = px.scatter(data_INDIA, x = "YEAR-MONTH", y = "VALUE", animation_frame = "YEAR-MONTH", animation_group = "CITY",
           color = "CITY", size='VALUE', range_y=[0,600],range_x =["2018-12", "2021-06"] )
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")

## Visualize Data for USA


In [None]:
fig = px.scatter(data_USA, x = "YEAR-MONTH", y = "VALUE", animation_frame = "YEAR-MONTH", animation_group = "CITY",
           color = "CITY", size='VALUE', range_y=[0,600],range_x =["2018-12", "2021-06"] )
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")

## Visualize Data for both Countries

In [None]:
fig = px.scatter(df4, x = "YEAR-MONTH", y = "VALUE", animation_frame = "YEAR-MONTH", animation_group = "CITY",
           color = "COUNTRY", size='VALUE', range_y=[0,500],range_x =["2018-12", "2021-06"] )
fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")

## Visualize 3D-Data

In [None]:
# Read data from a csv
z_data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv')

fig = go.Figure(data=[go.Surface(z=z_data.values)])

fig.update_layout(title='Mt Bruno Elevation', autosize=False,
                  width=500, height=500,
                  margin=dict(l=65, r=50, b=65, t=90))



## Visualize CT-Data

In [None]:
# Import data


vol = io.imread("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/attention-mri.tif")
volume = vol.T
r, c = volume[0].shape

# Define frames
nb_frames = 68

fig = go.Figure(frames=[go.Frame(data=go.Surface(
    z=(6.7 - k * 0.1) * np.ones((r, c)),
    surfacecolor=np.flipud(volume[67 - k]),
    cmin=0, cmax=200
    ),
    name=str(k) # you need to name the frame for the animation to behave properly
    )
    for k in range(nb_frames)])

# Add data to be displayed before animation starts
fig.add_trace(go.Surface(
    z=6.7 * np.ones((r, c)),
    surfacecolor=np.flipud(volume[67]),
    colorscale='Gray',
    cmin=0, cmax=200,
    colorbar=dict(thickness=20, ticklen=4)
    ))


def frame_args(duration):
    return {
            "frame": {"duration": duration},
            "mode": "immediate",
            "fromcurrent": True,
            "transition": {"duration": duration, "easing": "linear"},
        }

sliders = [
            {
                "pad": {"b": 10, "t": 60},
                "len": 0.9,
                "x": 0.1,
                "y": 0,
                "steps": [
                    {
                        "args": [[f.name], frame_args(0)],
                        "label": str(k),
                        "method": "animate",
                    }
                    for k, f in enumerate(fig.frames)
                ],
            }
        ]

# Layout
fig.update_layout(
         title='Slices in volumetric data',
         width=600,
         height=600,
         scene=dict(
                    zaxis=dict(range=[-0.1, 6.8], autorange=False),
                    aspectratio=dict(x=1, y=1, z=1),
                    ),
         updatemenus = [
            {
                "buttons": [
                    {
                        "args": [None, frame_args(50)],
                        "label": "&#9654;", # play symbol
                        "method": "animate",
                    },
                    {
                        "args": [[None], frame_args(0)],
                        "label": "&#9724;", # pause symbol
                        "method": "animate",
                    },
                ],
                "direction": "left",
                "pad": {"r": 10, "t": 70},
                "type": "buttons",
                "x": 0.1,
                "y": 0,
            }
         ],
         sliders=sliders
)


## Please Restart the Kernel

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)