# Altair - Vega Graph 

Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite.
Altair offers a powerful and concise visualization grammar that enables you to build a wide range of statistical visualizations quickly.

https://altair-viz.github.io/getting_started/overview.html

### Specifying Data in Altair

Each top-level chart object (i.e. Chart, LayerChart, and VConcatChart, HConcatChart, RepeatChart, FacetChart) accepts a dataset as its first argument. The dataset can be specified in one of the following ways:

    as a Pandas DataFrame
    as a Data or related object (i.e. UrlData, InlineData, NamedData)
    as a url string pointing to a json or csv formatted text file
    as an object that supports the __geo_interface__ (eg. Geopandas GeoDataFrame, Shapely Geometries, GeoJSON Objects)
  
https://altair-viz.github.io/user_guide/data.html

**The motive of this exercise is to generate as many graphs as possible from the given dataset using Altair. Currently we are focusing to generate graph with all the available columns (except for string data types). We are automating the process of JSON generation which will then be used to generate image in Visual Studio code.**

In [None]:
import pandas as pd
import numpy as np
import json
import os

#import  Altair API  
import altair as alt

In [None]:
#df = pd.read_csv('Titanic_full.csv')

In [None]:
df = pd.read_csv('AB_NYC_2019.csv')

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.shape

In [None]:
df.dtypes

In [None]:
#Calculating null values
nulls_count = {col: df[col].isnull().sum() for col in df.columns} 
print(nulls_count)

## Data Cleaning

In [None]:
# We are dropping  Columns which have more than 30% of null value
# Repalcing null value with mean in case of int and float
# If null value persist for other cases we are dropping those rows

is_null_count_out_of_range = {col: df[col].isnull().sum()/df.shape[0] *100 > 30 for col in df.columns}

for k,v in is_null_count_out_of_range.items():
    if v:
        df.drop( k,axis=1,inplace=True )
    else:
        if isinstance(df[k][0], (np.int64, np.float64)) :
            df[k].fillna(df[k].mean(), inplace=True)
        else :
            drop_list = df[df[k].isnull()].index.tolist()
            df.drop( drop_list,axis=0,inplace=True  )
            
     
    nulls_count = {col: df[col].isnull().sum() for col in df.columns}
    
print(nulls_count)

## Data type conversion

In [None]:
#Finding all the unique values to in each column
uniques = {col: df[col].unique().tolist() for col in df.columns}

print(uniques)

In [None]:
# Directory 
directory = "Altair_Plots"
  
# Parent Directory path 
parent_dir = "../"
  
# Path 
path = os.path.join(parent_dir, directory) 

try:  
    os.mkdir(path)  
except OSError as error:  
    print(error)

In [None]:
#Writing the above created dictionary to a text file

with open(path +'/Unique_values.txt', 'w') as json_file:
      json.dump(uniques, json_file)

In [None]:
#checking for null value
df.isnull().any()

In [None]:
df.shape

In [None]:
# Identifying number of unique values in each column 
for k,v in uniques.items():
    v = pd.Index(v)
    print(k +' : '+ str(len(v)))

In [None]:
#Converting columns to categorical having less than or equal to 10 
#unique values in a cloumn

for k,v in uniques.items():
    if len(pd.Index(v)) <=10:
        df[k]=df[k].astype('category')
        
df.dtypes

In [None]:
cat_col=df.select_dtypes(include=['category']).columns.tolist()
num_col=df.select_dtypes(include=['int64','float64']).columns.tolist()

print("Numerical Column : '\n'"+str(num_col))
print("Categorical Column : '\n'"+str(cat_col))

## Altair JSON generation

In [None]:
alt.data_transformers.disable_max_rows()

In [None]:
alt.Chart(df).mark_bar().encode( x='neighbourhood_group',y='reviews_per_month')

In [None]:
# Generating JSON using altair methods

for i in range(len(cat_col)):
    for j in range(len(num_col)): 
        chart=alt.Chart(df).mark_bar().encode( x=cat_col[i],y=num_col[j])
        chart.save(path+'/'+str(cat_col[i])+" Vs "+str(num_col[j])+"_"+"plot.json")
        #print(cat_col[i],num_col[j])
print("JSON generated in ""Altair_Plots"" folder for the combinations")

<div class="alert alert-block alert-info">
    <b>Copyright</b> 2020 Srushti Dhamangaonkar and Hung-Chih Huang<br>
    <br>Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:<br>
    <br>The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.<br>
    <br>THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
    <br><br>
    
<div class="text-center">
    <a rel="license" href="http://creativecommons.org/licenses/by/3.0/us/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/3.0/us/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/us/">Creative Commons Attribution 3.0 United States License</a>.<br>
</div></div>