# Altair
Altair: Altair is a declarative statistical visualization library for Python
pip install altair vega_datasets

https://vitalflux.com/altair-python-install-jupyter-notebook/
Data in Altair is built around the pandas DataFrame
The fundamental object in the Altair is chart which takes a dataframe as a single argument:
chart = alt.Chart(data)

In [7]:
#import altair
import altair as alt 
import pandas as pd
import numpy as np

In [4]:
# graph
from vega_datasets import data
cars = data.cars()
cars

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA
...,...,...,...,...,...,...,...,...,...
401,ford mustang gl,27.0,4,140.0,86.0,2790,15.6,1982-01-01,USA
402,vw pickup,44.0,4,97.0,52.0,2130,24.6,1982-01-01,Europe
403,dodge rampage,32.0,4,135.0,84.0,2295,11.6,1982-01-01,USA
404,ford ranger,28.0,4,120.0,79.0,2625,18.6,1982-01-01,USA


In [5]:
alt.Chart(cars).mark_point().encode( x='Horsepower', y='Miles_per_Gallon', color='Origin').interactive()

## Graph1
When data is specified as a DataFrame, the encoding is quite simple, as Altair uses the data type information provided by Pandas to automatically determine the data types required in the encoding.

In [8]:
data = pd.DataFrame({'x': ['A', 'B', 'C', 'D', 'E'],
                     'y': [5, 3, 6, 7, 2]})
alt.Chart(data).mark_bar().encode(x='x', y='y',)

In [10]:
from vega_datasets import data
url = data.cars.url

alt.Chart(url).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q'
)
#we must also specify the data type when referencing data by URL:

## Including Index Data
By design Altair only accesses dataframe columns, not dataframe indices. At times, relevant data appears in the index. For example:

In [11]:
import numpy as np
rand = np.random.RandomState(0)

data = pd.DataFrame({'value': rand.randn(100).cumsum()},
                    index=pd.date_range('2018', freq='D', periods=100))
data.head()

Unnamed: 0,value
2018-01-01,1.764052
2018-01-02,2.16421
2018-01-03,3.142948
2018-01-04,5.383841
2018-01-05,7.251399


In [None]:
alt.Chart(data.reset_index()).mark_line().encode(
    x='index:T',
    y='value:Q'
)

In [17]:
alt.Chart(data).mark_line().encode(x ='value:Q', y='value:Q')

## Long and Wide Form
https://altair-viz.github.io/user_guide/data.html
here are two common conventions for storing data in a dataframe, sometimes called long-form and wide-form. Both are sensible patterns for storing data in a tabular format; briefly, the difference is this:

wide-form data has one row per independent variable, with metadata recorded in the row and column labels.

long-form data has one row per observation, with metadata recorded within the table as values.

Altair’s grammar works best with long-form data, in which each row corresponds to a single observation along with its metadata.

A concrete example will help in making this distinction more clear. Consider a dataset consisting of stock prices of several companies over time. The wide-form version of the data might be arranged as follows:
Altair works best with this long-form data, because relevant data and metadata are stored within the table itself, rather than within the labels of rows and columns

In [19]:
wide_form = pd.DataFrame({'Date': ['2022-10-01', '2022-11-01', '2022-12-01'],
                          'AAPL': [189.95, 182.22, 198.08],
                          'AMZN': [89.15, 90.56, 92.64],
                          'GOOG': [707.00, 693.00, 691.48]})
print(wide_form)

         Date    AAPL   AMZN    GOOG
0  2022-10-01  189.95  89.15  707.00
1  2022-11-01  182.22  90.56  693.00
2  2022-12-01  198.08  92.64  691.48


In [20]:
long_form = pd.DataFrame({'Date': ['2022-10-01', '2022-11-01', '2022-12-01',
                                   '2022-10-01', '2022-11-01', '2022-12-01',
                                   '2022-10-01', '2022-11-01', '2022-12-01'],
                          'company': ['AAPL', 'AAPL', 'AAPL',
                                      'AMZN', 'AMZN', 'AMZN',
                                      'GOOG', 'GOOG', 'GOOG'],
                          'price': [189.95, 182.22, 198.08,
                                     89.15,  90.56,  92.64,
                                    707.00, 693.00, 691.48]})
print(long_form)

         Date company   price
0  2022-10-01    AAPL  189.95
1  2022-11-01    AAPL  182.22
2  2022-12-01    AAPL  198.08
3  2022-10-01    AMZN   89.15
4  2022-11-01    AMZN   90.56
5  2022-12-01    AMZN   92.64
6  2022-10-01    GOOG  707.00
7  2022-11-01    GOOG  693.00
8  2022-12-01    GOOG  691.48


In [21]:
alt.Chart(long_form).mark_line().encode(
  x='Date:T',
  y='price:Q',
  color='company:N'
)

In [22]:
wide_form.melt('Date', var_name='company', value_name='price')

Unnamed: 0,Date,company,price
0,2022-10-01,AAPL,189.95
1,2022-11-01,AAPL,182.22
2,2022-12-01,AAPL,198.08
3,2022-10-01,AMZN,89.15
4,2022-11-01,AMZN,90.56
5,2022-12-01,AMZN,92.64
6,2022-10-01,GOOG,707.0
7,2022-11-01,GOOG,693.0
8,2022-12-01,GOOG,691.48


In [23]:
long_form.pivot(index='Date', columns='company', values='price').reset_index()

company,Date,AAPL,AMZN,GOOG
0,2022-10-01,189.95,89.15,707.0
1,2022-11-01,182.22,90.56,693.0
2,2022-12-01,198.08,92.64,691.48


In [24]:
#Converting Between Long-form and Wide-form: Fold Transform
alt.Chart(wide_form).transform_fold(
    ['AAPL', 'AMZN', 'GOOG'],
    as_=['company', 'price']
).mark_line().encode(
    x='Date:T',
    y='price:Q',
    color='company:N'
)

## Sequence Generator
Here is an example of using the sequence() function to generate a sequence of x data, along with a Calculate Transform to compute y data.

In [28]:
data = alt.sequence(0, 10, 0.1, as_='x')

KeyError: 0

In [29]:
alt.Chart(data).transform_calculate(
    y='sin(datum.x)'
).mark_line().encode(
    x='x:Q',
    y='y:Q',
)

## Graticule Generator
Another type of data that is convenient to generate in the chart itself is the latitude/longitude lines on a geographic visualization, known as a graticule. These can be created using Altair’s graticule() generator function. Here is a simple example:

In [31]:
data = alt.graticule(step=[15, 15])

In [33]:
#alt.Chart(data).mark_geoshape(stroke='black').project('orthographic',rotate=[0, -45, 0])

## Save Chart
https://altair-viz.github.io/user_guide/saving_charts.html

## Case Study
https://altair-viz.github.io/case_studies/exploring-weather.html

In [36]:
from vega_datasets import data
df = data.seattle_weather()
df.head()

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
0,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,2012-01-02,10.9,10.6,2.8,4.5,rain
2,2012-01-03,0.8,11.7,7.2,2.3,rain
3,2012-01-04,20.3,12.2,5.6,4.7,rain
4,2012-01-05,1.3,8.9,2.8,6.1,rain


In [37]:
alt.Chart(df).mark_tick().encode(
    x='precipitation',
)

It looks as though precipitation is skewed towards lower values; that is, when it rains in Seattle, it usually doesn’t rain very much. It is difficult to see patterns across continuous variables, and so to better see this, we can create a histogram of the precipitation data. For this we first discretize the precipitation values by adding a binning to x. Additionally, we set our encoding channel y with count. The result is a histogram of precipitation values:

In [38]:
alt.Chart(df).mark_bar().encode(
    alt.X('precipitation', bin=True),
    y='count()'
)

Next, let’s look at how precipitation in Seattle changes throughout the year. Altair natively supports dates and discretization of dates when we set the type to temporal (shorthand T). For example, in the following plot, we compute the total precipitation for each month. To discretize the data into months, we can use a month binning (see TimeUnit Transform for more information about this and other timeUnit binnings):

In [39]:
alt.Chart(df).mark_line().encode(
    x='month(date):T',
    y='average(precipitation)'
)

In [40]:
alt.Chart(df).mark_line().encode(
    x='yearmonth(date):T',
    y='max(temp_max)',
)

In [41]:
alt.Chart(df).mark_line().encode(
    x='year(date):T',
    y='mean(temp_max)',
)

In [42]:
alt.Chart(df).mark_bar().encode(
    x='mean(temp_max)',
    y='year(date):O'
)

In [43]:
alt.Chart(df).mark_bar().encode(
    x='mean(temp_range):Q',
    y='year(date):O'
).transform_calculate(
    temp_range="datum.temp_max - datum.temp_min"
)

In [44]:
alt.Chart(df).mark_bar().encode(
    x='month(date):N',
    y='count()',
    color='weather',
)

In [45]:
scale = alt.Scale(domain=['sun', 'fog', 'drizzle', 'rain', 'snow'],
                  range=['#e7ba52', '#c7c7c7', '#aec7e8', '#1f77b4', '#9467bd'])
alt.Chart(df).mark_bar().encode(
    x=alt.X('month(date):N', title='Month of the year'),
    y='count()',
    color=alt.Color('weather', legend=alt.Legend(title='Weather type'), scale=scale),
)

In [46]:
alt.Chart(df).mark_point().encode(
    alt.X('temp_max', title='Maximum Daily Temperature (C)'),
    alt.Y('temp_range:Q', title='Daily Temperature Range (C)'),
    alt.Color('weather', scale=scale),
    alt.Size('precipitation', scale=alt.Scale(range=[1, 200]))
).transform_calculate(
    "temp_range", "datum.temp_max - datum.temp_min"
).properties(
    width=600,
    height=400
).interactive()

In [47]:
alt.Chart(df).mark_bar().encode(
    x='count()',
    y='weather:N',
    color=alt.Color('weather:N', scale=scale),
)

In [49]:
## Pie Charts
source = pd.DataFrame({"category": [1, 2, 3, 4, 5, 6], "value": [4, 6, 10, 3, 7, 8]})

alt.Chart(source).mark_arc().encode(
    theta=alt.Theta(field="value", type="quantitative"),
    color=alt.Color(field="category", type="nominal"),)

In [50]:
## Radial charts
source = pd.DataFrame({"values": [12, 23, 47, 6, 52, 19]})

base = alt.Chart(source).encode(
    theta=alt.Theta("values:Q", stack=True),
    radius=alt.Radius("values", scale=alt.Scale(type="sqrt", zero=True, rangeMin=20)),
    color="values:N",
)

c1 = base.mark_arc(innerRadius=20, stroke="#fff")

c2 = base.mark_text(radiusOffset=10).encode(text="values:Q")

c1 + c2