# **Altair Exercise**
---

<a href="https://midoritoyota.netlify.app/" target="_blank"><img align="left" src="portfolio.png" title="See my portfolio!"/></a><img align="left" src="espaco.png"/>

<a href="mailto:midori.toyota@gmail.com" target="_blank"><img align="left" src="gmail.png" title="Contact me!"/></a><img align="left" src="espaco.png"/>

<a href="https://www.linkedin.com/in/midoritoyota/" target="_blank"> <img align="left" src="linkedin.png" title="Connect with me on linkedin!" /></a><img align="left" src="espaco.png"/>

<a href="https://github.com/MidoriToyota" target="_blank"> <img align="left" src="github.png" title="Follow me on github!"/></a>

<br/><br/>


## **Pacotes**

In [7]:
import altair as alt
import pandas as pd
import numpy as np

## **Getting Started**

https://altair-viz.github.io/getting_started/overview.html

In [74]:
# Loading data
cars = pd.read_csv("cars.csv")
cars.head()

Unnamed: 0,Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length
0,Acura,MDX,SUV,Asia,All,"$36,945","$33,337",3.5,6.0,265,17,23,4451,106,189
1,Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820","$21,761",2.0,4.0,200,24,31,2778,101,172
2,Acura,TSX 4dr,Sedan,Asia,Front,"$26,990","$24,647",2.4,4.0,200,22,29,3230,105,183
3,Acura,TL 4dr,Sedan,Asia,Front,"$33,195","$30,299",3.2,6.0,270,20,28,3575,108,186
4,Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755","$39,014",3.5,6.0,225,18,24,3880,115,197


In [50]:
# Basic info
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 428 entries, 0 to 427
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Make         428 non-null    object 
 1   Model        428 non-null    object 
 2   Type         428 non-null    object 
 3   Origin       428 non-null    object 
 4   DriveTrain   428 non-null    object 
 5   MSRP         428 non-null    object 
 6   Invoice      428 non-null    object 
 7   EngineSize   428 non-null    float64
 8   Cylinders    426 non-null    float64
 9   Horsepower   428 non-null    int64  
 10  MPG_City     428 non-null    int64  
 11  MPG_Highway  428 non-null    int64  
 12  Weight       428 non-null    int64  
 13  Wheelbase    428 non-null    int64  
 14  Length       428 non-null    int64  
dtypes: float64(2), int64(6), object(7)
memory usage: 50.3+ KB


In [72]:
# Basic chart
alt.Chart(cars).mark_point().encode(y = 'Origin', x = 'EngineSize')

In [71]:
# Aggregating functions
alt.Chart(cars).mark_point().encode(y = 'Origin', x = 'average(EngineSize)')

In [73]:
# Changing type of graph
alt.Chart(cars).mark_bar().encode(y = 'Origin', x = 'average(EngineSize)')

In [32]:
# Inspecting
chart = alt.Chart(cars).mark_bar().encode(
    x='Origin',
    y='average(EngineSize)'
)

In [40]:
# Inspecting chart json
print(chart.to_json())

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-362fecc64292928e5a32b64921b7b919"
  },
  "datasets": {
    "data-362fecc64292928e5a32b64921b7b919": [
      {
        "Cylinders": 6.0,
        "DriveTrain": "All",
        "EngineSize": 3.5,
        "Horsepower": 265,
        "Invoice": "$33,337",
        "Length": 189,
        "MPG_City": 17,
        "MPG_Highway": 23,
        "MSRP": "$36,945",
        "Make": "Acura",
        "Model": "MDX",
        "Origin": "Asia",
        "Type": "SUV",
        "Weight": 4451,
        "Wheelbase": 106
      },
      {
        "Cylinders": 4.0,
        "DriveTrain": "Front",
        "EngineSize": 2.0,
        "Horsepower": 200,
        "Invoice": "$21,761",
        "Length": 172,
        "MPG_City": 24,
        "MPG_Highway": 31,
        "MSRP": "$23,820",
        "Make": "Acura",
        "Model": "RS

In [42]:
# Changing configurations
alt.Chart(cars).mark_bar().encode(
    alt.Y('Origin', type='nominal'),
    alt.X('average(EngineSize)', type='quantitative', aggregate='average')
)

In [48]:
# Changing configurations
alt.Chart(cars).mark_bar(color = 'firebrick').encode(
    alt.Y('Origin', type='nominal', title = 'Category'),
    alt.X('average(EngineSize)', type='quantitative', aggregate='average')
)

In [None]:
# Saving Chart
chart.save('chart.html')

## **Testing plots**


### **Corrplot**

**Generic plot**

In [127]:
# Generic example
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2

source = pd.DataFrame({'x': x.ravel(),
                     'y': y.ravel(),
                     'z': z.ravel()})

alt.Chart(source).mark_rect().encode(
    x='x:O',
    y='y:O',
    color='z:Q'
)

**Cars dataset**

Select columns by it's position number:

https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c

Plotting heatmap with altair from a dataframe:

https://towardsdatascience.com/altair-plot-deconstruction-visualizing-the-correlation-structure-of-weather-data-38fb5668c5b1

In [107]:
# Dat preprocessing
data = cars.iloc[:, 9:]
cor_data = data.corr().stack().reset_index()\
            .rename(columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}) 

cor_data['correlation_label'] = cor_data['correlation'].map('{:.2f}'.format)  # Round to 2 decimal
cor_data.head()

Unnamed: 0,variable,variable2,correlation,correlation_label
0,Horsepower,Horsepower,1.0,1.0
1,Horsepower,MPG_City,-0.676699,-0.68
2,Horsepower,MPG_Highway,-0.647195,-0.65
3,Horsepower,Weight,0.630796,0.63
4,Horsepower,Wheelbase,0.387398,0.39


In [125]:
# Plotting 
base = alt.Chart(cor_data).encode(
    x='variable2:O',
    y='variable:O'    
)

# The correlation heatmap itself
cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(width=500, height = 500)

text = base.mark_text().encode(
    text='correlation_label',
    color=alt.condition(
        alt.datum.correlation > 0.5, 
        alt.value('white'),
        alt.value('black')
    )
)

cor_plot + text

### **Histogram**

In [142]:
alt.Chart(cars).mark_bar().encode(
    alt.X("EngineSize:Q", bin=alt.Bin(maxbins=20)),
    alt.Y('count()', stack=None)
)

In [151]:
alt.Chart(cars).mark_circle(size=50).encode(
    x='Horsepower',
    y='MPG_Highway',
    color='Origin',
    tooltip=['Model', 'Origin', 'Horsepower', 'MPG_Highway']).interactive()

# **Plotly**
---

In [5]:
import pandas as pd
import plotly as go

ModuleNotFoundError: No module named 'plotly'

In [1]:
import sys
!conda install --yes --prefix {sys.prefix} plotly


EnvironmentLocationNotFound: Not a conda environment: C:\Users\Kamila



---

<a href="https://midoritoyota.netlify.app/" target="_blank"><img align="left" src="portfolio.png" title="See my portfolio!"/></a><img align="left" src="espaco.png"/>

<a href="mailto:midori.toyota@gmail.com" target="_blank"><img align="left" src="gmail.png" title="Contact me!"/></a><img align="left" src="espaco.png"/>

<a href="https://www.linkedin.com/in/midoritoyota/" target="_blank"> <img align="left" src="linkedin.png" title="Connect with me on linkedin!" /></a><img align="left" src="espaco.png"/>

<a href="https://github.com/MidoriToyota" target="_blank"> <img align="left" src="github.png" title="Follow me on github!"/></a>

<br/><br/>