<h1 style="text-align:center;">Mini Project - Altair - Vega Graph Hands-On

<hr style="height:.15em; background-color:gray">

<div>
    <h3 style="text-align:center;">Table of Context</h3>
</div>

>>1. [Learning Goals](#Learning-Goals)
>>2. [Introduction](#Introduction)
>>3. [Installation](#Installation)
>>4. [Imports](#Imports)
>>5. [Main Content](#Main-Content)  
     >>> a. [Loading Data](##Loading-Data)  
     >>> b. [File Information](##File-Information)  
     >>> c. [Getting Data Ready for Visualization](##Getting-Data-Ready-for-Visualization)  
     >>> d. [Using Altair Visualizations](##Using-Altair-Visualizations)  
     >>> f. [Save as JSON](##Save-as-JSON)  
>>6. [Exercises](#Exercises)  
>>7. [Additional Resources](#Additional-Resources)  
>>8. [Citations](#Citations)  
>>9. [Footer](#Footer)  

<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-warning">
<h1 style="text-align:center;">Learning Goals
<br>

> By the end of this tutorial, you will be able to:
>>- Identifying objects, encoding used in Altiar.
>>- Formats accepted by Altiar.
>>- Understand how to plot using the excisting Altiar - Vega Graphs.
>>- How to read generated json schema.
>>- Understanding the usecase and future implementation of Altiar.

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-warning">
    <h1 style="text-align:center;">Introduction</h1>
    <br>

The motive of this notebook is to introduce and explain what is vega and Altair, while covering the step by step guide to integrate and utilize vega - altair graphs in dataset. we will also cover the aspect of customization of graph. The topics will be followed by exercises for the reader for better understanding and to get hands on the altair code.   


The output plot or graph can be saved in various format like png or jpg or svg or json. Json has various usage which can be later tweaked and we can generate the back the image with changes. The generated json can be used to train neural network to learn what and how different plots are and model can identify graph or plots.

<h3 class="text-primary">What is Vega?
  

Vega is a visualization grammar, a declarative language for creating, saving, and sharing interactive visualization designs. With Vega, you can describe the visual appearance and interactive behavior of a visualization in a JSON format, and generate web-based views using Canvas or SVG.    

Vega provides basic building blocks for a wide variety of visualization designs: data loading and transformation, scales, map projections, axes, legends, and graphical marks such as rectangles, lines, plotting symbols, etc. Interaction techniques can be specified using reactive signals that dynamically modify a visualization in response to input event streams.  

A Vega specification defines an interactive visualization in a JSON format. Specifications are parsed by Vega’s JavaScript runtime to generate both static images or interactive web-based views. Vega provides a convenient representation for computational generation of visualizations, and can serve as a foundation for new APIs and visual analysis tools.


<h3 class="text-primary">What is Altair?

Altair is Python API to connect to Vega. Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite. Altair offers a powerful and concise visualization grammar that enables you to build a wide range of statistical visualizations quickly.  

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-warning">
    <h1 style="text-align:center;">Installation</h1>
<br>

 
Altair can be installed, along with the example datastes in vega_datasets, using:  

   > $ pip install altair vega_datasets

In [None]:
!pip install altair vega_datasets

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-warning">
    <h1 style="text-align:center;">Imports</h1>
    <br>

In [None]:
import pandas as pd
import numpy as np
import json
import os
from IPython.display import JSON

#import  Altair API  
import altair as alt

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-warning">
    <h1 style="text-align:center;">Main Content</h1>
    <br>

<div class="alert alert-block alert-success">
    <h2 style="text-align:center;">Loading Data</h2>
    <br>

Each top-level chart object (i.e. Chart, LayerChart, and VConcatChart, HConcatChart, RepeatChart, FacetChart) accepts a dataset as its first argument. The dataset can be specified in one of the following ways:

    as a Pandas DataFrame
    as a Data or related object (i.e. UrlData, InlineData, NamedData)
    as a url string pointing to a json or csv formatted text file
    as an object that supports the __geo_interface__ (eg. Geopandas GeoDataFrame, Shapely Geometries, GeoJSON Objects)
  

In [None]:
# !pip install kaggle
# !pip install --upgrade google-api-python-client
# !pip install google-colab

In [None]:
# from google.colab import files
# uploaded = files.upload()

In [None]:
#Kaggle API
# !mkdir -p ~/.kaggle
# !cp kaggle.json ~/.kaggle/

# !chmod 600 ~/.kaggle/kaggle.json

In [None]:

# !kaggle datasets download -d dgomonov/new-york-city-airbnb-open-data

In [None]:
# !ls

In [None]:
#Unzip command
# !unzip -q Train.zip -d .

In [None]:
df1 = pd.read_csv('COVID-19.csv')

In [None]:
df = pd.read_csv('AB_NYC_2019.csv')

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-success">
    <h2 style="text-align:center;">File Information</h2>
    <br>

In [None]:
df.head()

In [None]:
df1.head()

In [None]:
df.info()

In [None]:
df.shape

In [None]:
df1.shape

In [None]:
df1.describe()

In [None]:
df.dtypes

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-success">
    <h2 style="text-align:center;">Getting Data ready for Visualization</h2>
    <br>

<h3 class="text-primary">Data Cleaning

In [None]:
#Calculating null values
def null_count(df):
    nulls_count = {col: df[col].isnull().sum() for col in df.columns} 
    print(nulls_count)

In [None]:
null_count(df)
null_count(df1)

In [None]:
# We are dropping  Columns which have more than 30% of null value
# Repalcing null value with mean in case of int and float
# If null value persist for other cases we are dropping those rows

is_null_count_out_of_range = {col: df[col].isnull().sum()/df.shape[0] *100 > 30 for col in df.columns}

for k,v in is_null_count_out_of_range.items():
    if v:
        df.drop( k,axis=1,inplace=True )
    else:
        if isinstance(df[k][0], (np.int64, np.float64)) :
            df[k].fillna(df[k].mean(), inplace=True)
        else :
            drop_list = df[df[k].isnull()].index.tolist()
            df.drop( drop_list,axis=0,inplace=True  )
            
     
    nulls_count = {col: df[col].isnull().sum() for col in df.columns}
    
print(nulls_count)

<h3 class="text-primary">Data type conversion

In [None]:
#Converting columns to categorical having less than or equal to 10 
#unique values in a cloumn

uniques = {col: df[col].unique().tolist() for col in df.columns}

for k,v in uniques.items():
    if len(pd.Index(v)) <=10:
        df[k]=df[k].astype('category')
        
df.dtypes

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-success">
    <h2 style="text-align:center;">Using Altair Visualizations</h2>
    <br>

<h3 class="text-primary">Chart object:

The fundamental object in Altair is the Chart, which takes a dataframe as a single argument

 >  chart = alt.Chart(data)



<h3 class="text-primary">Encodings and Marks:

Chart Object can specify how we would like the data to be visualized. 
This is done via the mark attribute of the chart object, which is most conveniently accessed via the __Chart.mark_*__ methods. 
For example, we can show the data as a point using mark_point().  
For point chart

   >       alt.Chart(data).mark_point()   

For bar chart
   >       alt.Chart(data).mark_bar()


For more details on Marks and its properties, refer below link:  
https://altair-viz.github.io/user_guide/marks.html  

**Axis, color, size**  
We can map various encoding channels, or channels for short, to columns in the dataset. 
For example, we could encode the variable a of the data with the x channel, which represents the x-axis position of the points. This can be done straightforwardly via the Chart.encode() method:   

   >       alt.Chart(data).mark_point().encode(x='a',)  


The encode() method builds a key-value mapping between encoding channels (such as 'x', 'y', 'color', 'shape', 'size', etc.) to columns in the dataset, accessed by column name.
we can swap x & y axis to get horizontal or vertical plot.

For pandas dataframes, Altair automatically determines the appropriate data type for the mapped column, which in this case is a nominal value, or an unordered categorical.  

**Aggregation**  
 Altair has a built-in syntax for aggregation of data. 
 For example, we can compute the average of all values by specifying this aggregate within the column identifier:
   >       alt.Chart(data).mark_point().encode(
           x='a',
           y='average(b)')  
           
           
For more details on encoding and properties, refere below link:  
https://altair-viz.github.io/user_guide/encoding.html

<h3 class="text-primary">Short hand notation:

Below code can be written as :
   >       y = alt.Y(field='b', type='quantitative', aggregate='average')

short and notation 
   >       y = alt.Y('average(b):Q')

<h3 class="text-primary">Customize Visualization:

We can specify the axis titles using the axis attribute of channel classes, and we can specify the color of the marking by setting the color keyword of the Chart.mark_* methods to any valid HTML color string  
 
   >       alt.Chart(data).mark_bar(color='firebrick').encode
   >          (  
   >           alt.Y('a', title='category'),  
   >           alt.X('average(b)', title='avg(b) by category')  
   >          )

In [None]:
'''Below code is to enable altair to pick all rows from dataframe, else it throws below error 
    "MaxRowsError: The number of rows in your dataset is greater than the maximum
              allowed (5000). For information on how to plot larger datasets
              in Altair, see the documentation."'''

alt.data_transformers.disable_max_rows()

<h3 class="text-primary">Stacked Bar Graph:

In [None]:
alt.Chart(df).mark_bar(
            cornerRadiusTopLeft=3,
            cornerRadiusTopRight=3
).encode( x='neighbourhood_group',y='count():Q', color='room_type')

<h3 class="text-primary">Layered Bar Chart:

In [None]:
#Layered Bar Chart
alt.Chart(df1).mark_bar(opacity=0.7).encode(
    x='deaths:O',
    y=alt.Y('count():Q', stack=None),
    color="region",
)

<h3 class="text-primary">Line chart with Confidence Interval:

In [None]:

from vega_datasets import data
source = data.cars()

line = alt.Chart(source).mark_line().encode(
    x='Year',
    y='mean(Miles_per_Gallon)'
)

band = alt.Chart(source).mark_errorband(extent='ci').encode(
    x='Year',
    y=alt.Y('Miles_per_Gallon', title='Price'),
)

band + line

<h3 class="text-primary">Stacked Bar Chart with Text Overlay:

In [None]:

from vega_datasets import data

bars = alt.Chart(df).mark_bar().encode(
    x=alt.X('sum(minimum_nights):Q', stack='zero'),
    y=alt.Y('neighbourhood_group:N'),
    color=alt.Color('neighbourhood_group')
)

text = alt.Chart(df).mark_text(dx=-15, dy=3, color='white').encode(
    x=alt.X('sum(minimum_nights):Q', stack='zero'),
    y=alt.Y('neighbourhood_group:N'),
    detail='neighbourhood_group:N',
    text=alt.Text('sum(minimum_nights):Q', format='.1f')
)

bars + text

<h3 class="text-primary">Simple Strip Plot:

In [None]:

alt.Chart(df).mark_tick().encode(
    x='price:Q',
    y='room_type:O'
)

<h3 class="text-primary">Simple Scatter Plot with Tooltips:

In [None]:

alt.Chart(df).mark_circle(size=60).encode(
    y='price',
    x='availability_365',
    color='neighbourhood_group',
    tooltip=['price', 'neighbourhood_group', 'room_type']
).interactive()

<h3 class="text-primary">Interactive Scatter Plot 

In [None]:
input_dropdown = alt.binding_select(options=['Brooklyn','Bronx','Manhattan','Queens','Staten Island'])
selection = alt.selection_single(fields=['neighbourhood_group'], bind=input_dropdown, name='Select ')
color = alt.condition(selection,
                    alt.Color('neighbourhood_group:N', legend=None),
                    alt.value('lightgray'))

alt.Chart(df).mark_point().encode(
    x='availability_365:Q',
    y='price:Q',
    color=color,
    tooltip=['price', 'neighbourhood_group', 'room_type']
).add_selection(
    selection
).transform_filter(
    selection
)

<h3 class="text-primary">Scatter Plot with Href:

In [None]:

alt.Chart(df).transform_calculate(
    url='https://www.google.com/search?q=' + alt.datum.name
).mark_point().encode(
    x='availability_365:Q',
    y='price:Q',
    color='neighbourhood_group:N',
    href='url:N',
    tooltip=['name:N', 'url:N']
)

<h3 class="text-primary">Scatter Matrix:

In [None]:
alt.Chart(df).mark_circle().encode(
    alt.X(alt.repeat("column"), type='quantitative'),
    alt.Y(alt.repeat("row"), type='quantitative'),
    color='neighbourhood_group:N'
).properties(
    width=150,
    height=150
).repeat(
    row=['price', 'availability_365', 'minimum_nights'],
    column=['minimum_nights', 'availability_365', 'price']
).interactive()

<h3 class="text-primary">Violin Plot:

In [None]:
alt.Chart(df).transform_density(
    'price',
    as_=['price', 'density'],
    extent=[5, 50],
    groupby=['neighbourhood_group']
).mark_area(orient='horizontal').encode(
    y='price:Q',
    color='neighbourhood_group:N',
    x=alt.X(
        'density:Q',
        stack='center',
        impute=None,
        title=None,
        axis=alt.Axis(labels=False, values=[0],grid=False, ticks=True),
    ),
    column=alt.Column(
        'neighbourhood_group:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).properties(
    width=100
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

<h3 class="text-primary">Maps:

In [None]:

states = alt.topo_feature(data.world_110m.url, feature='countries')

# US states background
background = alt.Chart(states).mark_geoshape(
    fill='lightgray',
    stroke='white'
).properties(
    width=500,
    height=300
).project('naturalEarth1')

# airport positions on background
points = alt.Chart(df1).transform_aggregate(
    latitude='mean(latitude)',
    longitude='mean(longitude)',
    count='count()',
    groupby=['country','confirmed_cases','deaths']
).mark_circle().encode(
    longitude='longitude:Q',
    latitude='latitude:Q',
    size=alt.Size('count:Q', title='count'),
    color=alt.value('steelblue'),
    tooltip=['country','confirmed_cases:N','deaths:N']
).properties(
    title='World Map'
)

background + points

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-success">
    <h2 style="text-align:center;">Save as JSON</h2>
    <br>

Metadata is a powerful tool when working with images. It can tell you the dimensions of your image, DPI, camera EXIF data, and more—and each piece of data is a signal you can use to display your images to their best advantage with the JSON Output Format   
Once you have visualized your data, perhaps you would like to publish it somewhere on the web. This can be done straightforwardly using the Vega-Embed Javascript package. A simple example of a stand-alone HTML document can be generated for any chart using the Chart.save() method  

   >       chart = alt.Chart(data).mark_bar().encode(  
        x='a',  
        y='average(b)',)    
        
   >       chart.save('chart.json')

<h3 class="text-primary">JSON structure:

Below is the JSON structure generated form above code:

In [None]:
j1 = {  
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",  
  "config": {   
    "view": {   
      "continuousHeight": 300,  
      "continuousWidth": 400  
    }   
  },  
  "data": {  
    "name": "data-347f1284ea3247c0f55cb966abbdd2d8"  
  },  
  "datasets": {  
    "data-347f1284ea3247c0f55cb966abbdd2d8": [  
      {  
        "a": "C",  
        "b": 2  
      },    
      {   
        "a": "C",  
        "b": 7  
      },  
      {  
        "a": "C",  
        "b": 4  
      },  
      {  
        "a": "D",  
        "b": 1  
      }  
  
    ]  
  },   
  "encoding": {  
    "x": {  
      "field": "a",  
      "type": "nominal"  
    },  
    "y": {  
      "aggregate": "average",  
      "field": "b",  
      "type": "quantitative"  
    }  
  },  
  "mark": "bar"   
}  

print(json.dumps(j1, indent=2))

In [None]:
source = data.barley()

Chart=alt.Chart(source).mark_bar().encode(
    x='sum(yield):Q',
    y='year:O',
    color='year:N',
    row='site:N'
)

Chart.save('Bar_chart.json')

We can save the plot in other formats as well like png, svg etc

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-warning">
<h1 style="text-align:center;">Exercises</h1>
<br>

<h3 class="text-primary">Exercise 1:

In [None]:
source = pd.DataFrame({
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

alt.Chart(source).mark_bar().encode(
    x='a',
    y='b'
)

 Try to convert the above Bar graph to line graph  
Hint: check .mark_*

<h3 class="text-primary">Exercise 2:

In [None]:
source = data.stocks()

alt.Chart(source).mark_area(
    color="lightgreen",
    interpolate='step-after',
    line=True
).encode(
    x='date',
    y='price'
)

Convert the above graph to Filled step chart by adding transform_filter()  
Hint: Pass the parameter -  alt.datum.symbol == 'GOOG'

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-warning">
<h1 style="text-align:center;">Additional Resources</h1>
<br> 

Bar Charts: https://altair-viz.github.io/gallery/index.html#simple-charts

Line Charts: https://altair-viz.github.io/gallery/index.html#bar-charts

Area Charts: https://altair-viz.github.io/gallery/index.html#area-charts

Scatter Charts: https://altair-viz.github.io/gallery/index.html#scatter-plots

Histograms: https://altair-viz.github.io/gallery/index.html#histograms

Maps: https://altair-viz.github.io/gallery/index.html#maps

Interactive charts: https://altair-viz.github.io/gallery/index.html#interactive-charts

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-warning">
<h1 style="text-align:center;">Citations</h1>
<br>

* [Citing `Altair Docs`](https://altair-viz.github.io/index.html)
* [Citing `Vega`](https://vega.github.io/vega/)

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">

<div class="alert alert-block alert-warning">
<h1 style="text-align:center;">Footer</h1>
<br>

<div class="alert alert-block alert-info">
    <b>Copyright</b> 2020 Srushti Dhamangaonkar <br>
    <br>Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:<br>
    <br>The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.<br>
    <br>THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
    <br><br>
    
<div class="text-center">
    <a rel="license" href="http://creativecommons.org/licenses/by/3.0/us/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/3.0/us/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/us/">Creative Commons Attribution 3.0 United States License</a>.<br>
</div></div>

<a href="#Mini-Project---Altair---Vega-Graph-Hands-On"><p >Scroll Top
<hr style="height:.15em; background-color:gray">