# **Data Visualization**

**Types of plots**
* Line Plot
  * Display trends over time
  * Compare datasets with a continuous independent variable
  * Illustrate cause-and-effect relationships
  * Visualize continous data
  * Misleading if the scales on the axes are not carefully chosen to reflect the data accurately
* Bar Plot
  * Represent the magnitude of the data
  * Compare different categories or groups
  * Display discrete data that has distinct categories
  * Show how different categories contribute the a whole
  * Easily ranked or ordered
  * Misleading if inaccurate bar choices / axis scales are choosen
* Scatter Plot
  * Examine the relationship between two continuous variables
  * Investigate patterns or trends
  * Detect outliers or unusual observations
  * Identify clusters or groups
* Box Plot
  * Distribution of key statistics
  * Compare the distribution of a continous variable across different categories or groups
  * Examine spread and skewness of a dataset, visualizing quartiles
  * Identify and analyze potential outliers
  * Visualize summary statistics
  * Ignoring or mishandling outliers, can distort the interpretation of the data and mask important insights
* Histogram
  * Depict the shape & concentration of the data, whether it's symmetric skewed or bimodal.
  * Showcase data variability, allowing you to observe concentrations, gaps and clusters that reveal patterns or subgroups.
  * Binning affect the representation of data

**Plot Library**

* Matplotlib
  * line plots, scatter plots, bar charts, histograms, pie charts, box plots, and heat maps
* Pandas
* Seaborn
  * specialized statistical visualizations
  * categorical plots, count plots, heat maps, violin plots, scatter plots, bar plots
* Folium
  * Geospatial visualiztion
* Plotly
  * Highly interactive plots and dashboards
  * Web-based
* PyWaffle
  * Categorical data using waffle charts
  * waffle charts, square pie charts, donut charts

In [None]:
import pandas as pd
df_can = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/Canada.csv')

In [55]:
#set index and remove the name
df_can.set_index('Country', inplace=True)
df_can.index.name = None

In [None]:
df_can.head()

# **Maps & Geospatial Data**

* Folium: visualize geospatial data and create map using latitude and longitude values


In [None]:
import folium
world_map = folium.Map()
world_map

In [None]:
# define the world map centered around Canada with a low zoom level
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4)

#add custom tiles
folium.TileLayer(
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
    attr='Tiles &copy; Esri — Source: Esri, DeLorme, NAVTEQ, USGS, and the GIS User Community',
    name='ESRI World Imagery',
    max_zoom=20
).add_to(world_map)
folium.LayerControl().add_to(world_map)

world_map

In [None]:
#Add marker and label
Canada_map = folium.Map(location=[56.130, -106.35], zoom_start=4)

folium.Marker(location=[51.2538, -85.3232], popup='Ontario').add_to(Canada_map)

ontario = folium.map.FeatureGroup()  #add a red marker using feature group
ontario.add_child(
    folium.features.CircleMarker(
        [51.2538, -85.3232],
        radius=5,  #how big you want the circle marker to be
        color='red',
        fill_color='red'
    )
)
Canada_map.add_child(ontario)


Canada_map

In [None]:
#add multiple markers
locations = [
    {"location": [45.4215, -75.6989], "popup": "Ottawa"},
    {"location": [53.5461, -113.4938], "popup": "Edmonton"},
    {"location": [49.2827, -123.1207], "popup": "Vancouver"},
]

# Marker Cluster: prevent overcrowding
from folium.plugins import MarkerCluster

marker_cluster = MarkerCluster().add_to(Canada_map)

for loc in locations:
  folium.Marker(location = loc["location"],
                popup=loc["popup"]).add_to(marker_cluster)
Canada_map

**Choropleth Maps**
* A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable displayed on the map.
* The higher the measurement, the darker the color.

1. `geo_data`, which is the GeoJSON file.
2. `data`, which is the dataframe containing the data.
3. `columns`, which represents the columns in the dataframe that will be used to create the `Choropleth` map.
4. `key_on`, which is the key or variable in the GeoJSON file that contains the name of the variable of interest. To determine that, you will need to open the GeoJSON file using any text editor and note the name of the key or variable that contains the name of the countries, since the countries are our variable of interest. In this case, **name** is the key in the GeoJSON file that contains the name of the countries. Note that this key is case_sensitive, so you need to pass exactly as it exists in the GeoJSON file.

In [None]:
# download countries geojson file
! wget --quiet https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/world_countries.json

print('GeoJSON file downloaded!')

In [68]:
world_geo = r'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/world_countries.json' # geojson file

# create a plain world map
world_map = folium.Map(location=[0, 0], zoom_start=2)

In [None]:
folium.Choropleth(
    geo_data=world_geo,
    data=df_can,
    columns=['Country', 'Total'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Immigration to Canada',
    reset=True
).add_to(world_map)

# display map
world_map

# **Interactive Dashboard**

* Real-time visuals simplify business moving parts
* Display KPI
* Provide big picture

**Web-based dashboarding tool**
* Plotly: interactive, open-source, supports over 40 unique chart types; available in Python and Javascript; can be displayed in Jupiter Notebook, saved to standalone HTML files, or served as part of pure Python build web applications using dash.
  * Plotly Graph Objects: low-level interface to figures, traces, and layout
  * Plotly Express: High-level wrapper
  * https://plotly.com/python/
* Panel
* Viola
* Streamlit
* Bokeh
* ipywidgets
* matplotlib
* Bowtie
* Flask

Python dashboarding tool: https://pyviz.org/dashboarding/

John Snow's data journalism: https://www.theguardian.com/news/datablog/2013/mar/15/john-snow-cholera-map

In [72]:
import plotly.express as px
import plotly.graph_objects as go

age_array=np.random.randint(25,55,60)
income_array=np.random.randint(300000,700000,3000000)

In [None]:
##First we will create an empty figure using go.Figure()
fig=go.Figure() #go is JSON object
fig

In [None]:
#Next we will create a scatter plot by using the add_trace function and use the go.scatter() function within it
# In go.Scatter we define the x-axis data,y-axis data and define the mode as markers with color of the marker as blue
fig.add_trace(go.Scatter(x=age_array, y=income_array, mode='markers', marker=dict(color='blue')))

In [None]:
## Here we update these values under function attributes such as title,xaxis_title and yaxis_title
fig.update_layout(title='Economic Survey', xaxis_title='Age', yaxis_title='Income')
# Display the figure
fig.show()

In [None]:
# create line chart
numberofbicyclessold_array=[50,100,40,150,160,70,60,45]
months_array=["Jan","Feb","Mar","April","May","June","July","August"]

fig_line=go.Figure()
fig_line.add_trace(go.Scatter(x=months_array, y=numberofbicyclessold_array, mode='lines', marker=dict(color='green')))
fig_line.update_layout(title='Bicycle Sales', xaxis_title='Months', yaxis_title='Number of Bicycles Sold')
fig_line.show()

In [None]:
score_array=[80,90,56,88,95]
grade_array=['Grade 6','Grade 7','Grade 8','Grade 9','Grade 10']
fig = px.bar( x=grade_array, y=score_array, title='Pass Percentage of Classes')
fig.show()

In [None]:
heights_array = np.random.normal(160, 11, 200)
## Use plotly express histogram chart function px.histogram.Provide input data x to the histogram
fig = px.histogram(x=heights_array,title="Distribution of Heights")
fig.show()

In [None]:
exp_percent= [20, 50, 10,8,12]
house_holdcategories = ['Grocery', 'Rent', 'School Fees','Transport','Savings']
fig = px.pie(values=exp_percent, names=house_holdcategories, title='Household Expenditure')
fig.show()

In [None]:
data = dict(
    character=["Eve", "Cain", "Seth", "Enos", "Noam", "Abel", "Awan", "Enoch", "Azura"],
    parent=["", "Eve", "Eve", "Seth", "Seth", "Eve", "Eve", "Awan", "Eve" ],
    value=[10, 14, 12, 10, 2, 6, 6, 4, 4])

fig = px.sunburst(
    data,
    names='character',
    parents='parent',
    values='value',
    title="Family chart"
)
fig.show()

# **Dash**

* Open source User Interface python library from Plotly
* Dash's front end renders components using React.js.
* Easy to build GUI
* Declarative and Reactive
* Rendered in web browser and can be deployed to servers
* Inherently cross-platform and mobile ready

**Dash Component**
* Core Component
  * Describe higher-level interactive components generated with JavaScript, HTML, and CSS through the React.js library
  * import dash_core_components as dcc
  * Example: creating a slider, input area, check items, and date picker
* HTML Component
  * Has Components for every HTML tag
  * The dash_HTML_components library provides classes for all HTML tags and the keyword arguments describe the HTML attributes like style, class name, and ID.
  * import dash_html_components as html

User Guide: https://dash.plotly.com/

**Connect Core and HTML components using Callbacks**

* Callback function is a python function that is automatically called by Dash

    def callback_function:
          ...
          ...
      return some_result

    @app.callback(Output, Input)

* The @app.callback decorator decorates the callback function in order to tell Dash to call it. Whenever there's a change in the input component value.

Python decorators reference: https://realpython.com/primer-on-python-decorators/

Callback examples: https://dash.plotly.com/basic-callbacks

Gallery: https://dash.gallery/Portal/

In [None]:
import pandas as pd
import plotly.express as px
import dash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output

# Read the airline data into pandas dataframe
airline_data =  pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv',
                            encoding = "ISO-8859-1",
                            dtype={'Div1Airport': str, 'Div1TailNum': str,
                                   'Div2Airport': str, 'Div2TailNum': str})
# Randomly sample 500 data points. Setting the random state to be 42 so that we get same result.
#data = airline_data.sample(n=500, random_state=42)
# Pie Chart Creation
#fig = px.pie(data, values='Flights', names='DistanceGroup', title='Distance group proportion by flights')

# Create a dash application
app = dash.Dash(__name__)

# Design the dash layout
# Create an outer division using html.Div and add title to the dashboard using html.H1 component
# Add description about the graph using HTML P (paragraph) component
# Finally, add graph component.
app.layout = html.Div(children=[html.H1('Airline Dashboard',style={'textAlign': 'center', 'color': '#503D36', 'font-size': 40}),  #application title
                                html.Div(["Input: ", dcc.Input(id='input-yr', value=2010, type='number', style={'height': '50px', 'font-size': 35}),], style={'font-size': 40}),  #value will be updated in the callback function
                                html.Br(),
                                html.Br(),
                                html.Div(dcc.Graph(id='bat-plot')),
                                #html.P('Proportion of distance group (250 mile distance interval group) by flights.', style={'textAlign':'center', 'color': '#F57241'}),
                                #dcc.Graph(figure=fig),
                    ])

#add callback decorator
@app.callback( Output(component_id='bat-plot', component_property='figure'),
               Input(component_id='input-yr', component_property='value'))

def get_graph(entered_year):
    df =  airline_data[airline_data['Year']==int(entered_year)]
    g1 = df.groupby(['Reporting_Airline'])['Flights'].sum().nlargest(10).reset_index()
    fig1 = px.bar(g1, x='Reporting_Airline', y='Flights', title='Top 10 airline carrier in year ' + str(entered_year) + ' in terms of number of flights')
    fig1.update_layout()
    return fig1

# Run the application
if __name__ == '__main__':
    app.run_server(port = 8002, host='127.0.0.1', debug=True)

