We are going to learn about the main data types of Pandas: Series and Datarframes (that are in fact built on top of the NumPy array object). We explore these concepts through some examples.

Let us start by importing pandas and numpy.

In [None]:
import numpy as np
import pandas as pd

# 1- SERIES

The first main data type we will learn about for pandas is the Series data type. 

A Series is very similar to a NumPy array. What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [None]:
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}

** Using Lists**

In [None]:
pd.Series(data=my_list)

In [None]:
pd.Series(data=my_list,index=labels)

In [None]:
pd.Series(my_list,labels)

** NumPy Arrays **

In [None]:
pd.Series(arr)

In [None]:
pd.Series(arr,labels)

** Dictionary**

In [None]:
pd.Series(d)

### Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:

In [None]:
ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])                                   

In [None]:
ser1

In [None]:
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])                                   

In [None]:
ser2

In [None]:
ser1['USA']

Operations are then also done based off of index:

In [None]:
ser1 + ser2

# 2- DATAFRAMES

DataFrames are the workhorse of pandas. We can think of a DataFrame as a bunch of Series objects put together to share the same index.

In [None]:
from numpy.random import randn
np.random.seed(101)

In [None]:
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})
df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
df.head()

In [None]:
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())

In [None]:
df

## Selection and Indexing

Let's learn the various methods to grab data from a DataFrame

In [None]:
df['W']

In [None]:
# Pass a list of column names
df[['W','Z']]

In [None]:
# SQL Syntax (NOT RECOMMENDED!)
df.W

DataFrame Columns are just Series

In [None]:
type(df['W'])

**Creating a new column:**

In [None]:
df['new'] = df['W'] + df['Y']

In [None]:
df

** Removing Columns**

In [None]:
df.drop('new',axis=1)

In [None]:
# Not inplace unless specified!
df

In [None]:
df.drop('new',axis=1,inplace=True)

In [None]:
df

Can also drop rows this way:

In [None]:
df.drop('E',axis=0)

** Selecting Rows**

In [None]:
df.loc['A']

Or select based off of position instead of label 

In [None]:
df.iloc[2]

** Selecting subset of rows and columns **

In [None]:
df.loc['B','Y']

In [None]:
df.loc[['A','B'],['W','Y']]

### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [None]:
df

In [None]:
df>0

In [None]:
df[df>0]

In [None]:
df[df['W']>0]

In [None]:
df[df['W']>0]['Y']

In [None]:
df[df['W']>0][['Y','X']]

For two conditions you can use | and & with parenthesis:

In [None]:
df[(df['W']>0) & (df['Y'] > 1)]

## More Index Details

Let's discuss some more features of indexing, including resetting the index or setting it something else. We'll also talk about index hierarchy!

In [None]:
df

In [None]:
# Reset to default 0,1...n index
df.reset_index()

In [None]:
newind = 'CA NY WY OR CO'.split()

In [None]:
df['States'] = newind

In [None]:
df

In [None]:
df.set_index('States')

In [None]:
df

In [None]:
df.set_index('States',inplace=True)

In [None]:
df

In [None]:
import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})
df.head()

## Info on Unique Values

In [None]:
df['col2'].unique()

In [None]:
df['col2'].nunique()

In [None]:
df['col2'].value_counts()

## CSV

### CSV Input

In [None]:
df = pd.read_csv('example')
df

### CSV Output

In [None]:
df.to_csv('example2',index=False)