# The Pandas Tutorial

https://www.kaggle.com/learn/pandas

***

In [1]:
import numpy as np

import pandas as pd

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [2]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])

s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:

In [3]:
dates = pd.date_range("20130101", periods=6)

dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [4]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))

df

Unnamed: 0,A,B,C,D
2013-01-01,0.110834,-0.234889,-0.271206,0.541894
2013-01-02,0.643621,0.390751,-0.034837,0.412795
2013-01-03,0.479122,1.268756,-1.249518,-0.204053
2013-01-04,-1.711659,0.171306,0.361514,0.175856
2013-01-05,1.239032,2.28253,0.305351,0.807809
2013-01-06,-0.226519,0.061397,0.713015,2.628229


Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [12]:
df2 = pd.DataFrame(
    {
        "A": 7.0,
        "B": pd.Timestamp("20140102"),
        "C": pd.Series(1, index=list(range(8)), dtype="float32"),
        "D": np.array([7] * 8, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train","test", "train", "test", "train"]),
        "F": "foo",
    }
)

df2

Unnamed: 0,A,B,C,D,E,F
0,7.0,2014-01-02,1.0,7,test,foo
1,7.0,2014-01-02,1.0,7,train,foo
2,7.0,2014-01-02,1.0,7,test,foo
3,7.0,2014-01-02,1.0,7,train,foo
4,7.0,2014-01-02,1.0,7,test,foo
5,7.0,2014-01-02,1.0,7,train,foo
6,7.0,2014-01-02,1.0,7,test,foo
7,7.0,2014-01-02,1.0,7,train,foo


The columns of the resulting DataFrame have different dtypes.

In [14]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

Here is how to view the top and bottom rows of the frame

In [15]:
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,0.110834,-0.234889,-0.271206,0.541894
2013-01-02,0.643621,0.390751,-0.034837,0.412795
2013-01-03,0.479122,1.268756,-1.249518,-0.204053
2013-01-04,-1.711659,0.171306,0.361514,0.175856
2013-01-05,1.239032,2.28253,0.305351,0.807809


Display the index, columns:

In [18]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [19]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

### Super handy: describe() shows a quick statistic summary of your data: 

In [21]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,0.089072,0.656642,-0.02928,0.727088
std,1.012401,0.945895,0.687494,0.992452
min,-1.711659,-0.234889,-1.249518,-0.204053
25%,-0.142181,0.088874,-0.212114,0.235091
50%,0.294978,0.281029,0.135257,0.477345
75%,0.602496,1.049255,0.347473,0.74133
max,1.239032,2.28253,0.713015,2.628229


Transposing your data:

In [22]:
df.T

Unnamed: 0,2013-01-01,2013-01-02,2013-01-03,2013-01-04,2013-01-05,2013-01-06
A,0.110834,0.643621,0.479122,-1.711659,1.239032,-0.226519
B,-0.234889,0.390751,1.268756,0.171306,2.28253,0.061397
C,-0.271206,-0.034837,-1.249518,0.361514,0.305351,0.713015
D,0.541894,0.412795,-0.204053,0.175856,0.807809,2.628229


Thereare ways of sorting by axis, label and values etc, but I will leave them in the tutorial for now, and can access them later if I require them.