There are many ways of specifying the datasets-
1) as a pandas dataframe
2) as a data or related object (urldata, inlinedata, nameddata)
3) as a url joining json or csv
here is a list to know more https://altair-viz.github.io/user_guide/data.html#id1 

### 1. using pandas data frame
where x-colomn should be visualized on quantitative scale and y-colomn to be visualised as categorical(nominal) scale

In [1]:
import altair as alt
import pandas as pd

data = pd.DataFrame({'x': ['A', 'B', 'C', 'D', 'E'],
                     'y': [5, 3, 6, 7, 2]})
alt.Chart(data).mark_bar().encode(
    x='x',
    y='y',
)

### 2. non-pandas dataframe using JSON-style list of records

In [3]:
import altair as alt

data = alt.Data(values=[{'x': 'A', 'y': 5},
                        {'x': 'B', 'y': 3},
                        {'x': 'C', 'y': 6},
                        {'x': 'D', 'y': 7},
                        {'x': 'E', 'y': 2}])
alt.Chart(data).mark_bar().encode(
    x='x:N',  # specify nominal data
    y='y:Q',  # specify quantitative data
)

### extra markup for encoding (includes- nominal, quantitative, ordinal, temporal, geojson, etc.)

encoding shorthands https://altair-viz.github.io/user_guide/encodings/index.html#shorthand-description 
    
encoding datatypes https://altair-viz.github.io/user_guide/encodings/index.html#encoding-data-types 

### 3. referencing data by url
    

In [4]:
import altair as alt
from vega_datasets import data
url = data.cars.url

alt.Chart(url).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q'
)

### working with index data in pandas

In [5]:
import numpy as np
rand = np.random.RandomState(0)

data = pd.DataFrame({'value': rand.randn(100).cumsum()},
                    index=pd.date_range('2018', freq='D', periods=100))
data.head()

Unnamed: 0,value
2018-01-01,1.764052
2018-01-02,2.16421
2018-01-03,3.142948
2018-01-04,5.383841
2018-01-05,7.251399


### using reset_index() method of pandas to turn the chart into a column

In [6]:
alt.Chart(data.reset_index()).mark_line().encode(
    x='index:T',
    y='value:Q'
)

### pandas data forms (long and wide)
wide-form data has one row per independent variable, with metadata recorded in the row and column labels.

long-form data has one row per observation, with metadata recorded within the table as values. Altair works with long-form data where each row corresponds to the single observation in the metadata.   

Example of wide_form and long_form is provided below-

In [8]:
# stock price of several companies over time
wide_form = pd.DataFrame({'Date': ['2007-10-01', '2007-11-01', '2007-12-01'],
                          'AAPL': [189.95, 182.22, 198.08],
                          'AMZN': [89.15, 90.56, 92.64],
                          'GOOG': [707.00, 693.00, 691.48]})
print(wide_form)

         Date    AAPL   AMZN    GOOG
0  2007-10-01  189.95  89.15  707.00
1  2007-11-01  182.22  90.56  693.00
2  2007-12-01  198.08  92.64  691.48


In [9]:
long_form = pd.DataFrame({'Date': ['2007-10-01', '2007-11-01', '2007-12-01',
                                   '2007-10-01', '2007-11-01', '2007-12-01',
                                   '2007-10-01', '2007-11-01', '2007-12-01'],
                          'company': ['AAPL', 'AAPL', 'AAPL',
                                      'AMZN', 'AMZN', 'AMZN',
                                      'GOOG', 'GOOG', 'GOOG'],
                          'price': [189.95, 182.22, 198.08,
                                     89.15,  90.56,  92.64,
                                    707.00, 693.00, 691.48]})
print(long_form)

         Date company   price
0  2007-10-01    AAPL  189.95
1  2007-11-01    AAPL  182.22
2  2007-12-01    AAPL  198.08
3  2007-10-01    AMZN   89.15
4  2007-11-01    AMZN   90.56
5  2007-12-01    AMZN   92.64
6  2007-10-01    GOOG  707.00
7  2007-11-01    GOOG  693.00
8  2007-12-01    GOOG  691.48


# as altair works fine with long_form data. The example below illustrates the same. 

In [10]:
alt.Chart(long_form).mark_line().encode(
  x='Date:T',
  y='price:Q',
  color='company:N'
)

### we may use layered charts to visualize the long-form data https://altair-viz.github.io/user_guide/compound_charts.html#layer-chart

the visualization in pandas can be done with reshaping and pivot tables. The details can be found using this documentation
https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html 

with melt() method of pandas, long form can be transfered form wide to long format. the documentation can be found here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.melt.html

another method in pandas is pivot() which converts long format back to the wide format. the documentation can be found here
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html 

### to avoid the data pre-processing, we use altair's fold transformaiton. herein, we convert the wide-form data to the 
### long-form data. documentation- https://altair-viz.github.io/user_guide/transform/fold.html#user-guide-fold-transform 


here is an example of the fold transformaiton

In [11]:
alt.Chart(wide_form).transform_fold(
    ['AAPL', 'AMZN', 'GOOG'],
    as_=['company', 'price']
).mark_line().encode(
    x='Date:T',
    y='price:Q',
    color='company:N'
)

### 3 cases of generated data (generate data for display within the chart specification)-
1) Sequence
2) graticule 
2) sphere


documentation- https://altair-viz.github.io/user_guide/data.html#id1