# <b>Why Pandas?</b>

- Intrinsic data alignment
- Data standardization
- Data operation
- Functions for handling missing data
- Data Structures handling major use cases

# <b>Features of Pandas</b>

- Powerful data structure
- Fast and effcient data wrangling
- Easy data aggregation & transformation
- Tools for reading/writing data
- Intelligent and automated data alignment
- High performance merging and joining of data sets

## <b>Pandas Series</b>

- Very similar to a NumPy array.
- What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a lebel, meaning it can be indexed by a label, instead of just a number location.

## <b>How to create a Series?</b>

You can convert a list, numpy array, or dictionary to a Series. To create a series you call the function pd.Series()

In [8]:
# Creating a pandas series from a list

import pandas as pd

In [2]:
my_list = [10, 20, 30]

In [3]:
series = pd.Series(my_list)

In [4]:
print(series)

0    10
1    20
2    30
dtype: int64


In [5]:
print(series.index)

RangeIndex(start=0, stop=3, step=1)


In [6]:
print(series.values)

[10 20 30]


In [11]:
# Creating a series from the numpy Array

import pandas as pd
import numpy as np

index = ['a', 'b', 'c']
arr = np.array([10, 20, 30])

pd.Series(data=arr, index=index)

a    10
b    20
c    30
dtype: int64

In [12]:
# Creating a series from dictionary

d = {'a': 10, 'b': 20, 'c': 30}
pd.Series(d)

a    10
b    20
c    30
dtype: int64

## <b>Using Index in a Series</b>

- The key to using a Series is understanding its index.
- Pandas makes use of these index names or numbers by allowing for fast lookups of information (works like a hash table or dictionary).

In [14]:
# Custom index

ser1 = pd.Series([1, 2, 3, 4], index=['USA', 'Germany', 'USSR', 'Japan'])
ser2 = pd.Series([1, 2, 5, 4], index=['USA', 'Germany', 'Italy', 'Japan'])

# Get the value of 'USA'
print(ser1['USA'])

1


In [15]:
print(ser1)

USA        1
Germany    2
USSR       3
Japan      4
dtype: int64


In [16]:
print(ser2)

USA        1
Germany    2
Italy      5
Japan      4
dtype: int64


In [17]:
print(ser1 + ser2)

Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64


## <b>What are DataFrames</b>

DataFrames are a way to store data in rectangular grids that can easily be overviewed. Each row of these grids corresponds to measurements or values of an instance, while each column is a vector containing data for a specific variable. This means that a data frame's rows do not need to contain, but can contain, the same type of values: they can be numeric, character, logical, etc.

Data frames in Python come within the Panadas data frame consists of three main components: the data, the index and the columns

<b>Creating DataFrames manually</b>

The function that you will use is the Pandas Dataframe() function: It requires you to pass the data that you want to put in, the indices and the columns.

In [18]:
df = pd.DataFrame([[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8, 9]])

In [19]:
print(df)

   0  1  2
0  1  2  3
1  3  4  5
2  5  6  7
3  7  8  9


In [20]:
df = pd.DataFrame([[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8, 9]])

In [23]:
print("Shape:", df.shape)
print("Index:", df.index)
print(' ')
print(df)

Shape: (4, 3)
Index: RangeIndex(start=0, stop=4, step=1)
 
   0  1  2
0  1  2  3
1  3  4  5
2  5  6  7
3  7  8  9


### <b>Understanding the Index</b>

Before you start with adding, deleting and renaming the components of your DataFrame, you first need to know how you can select these elements.

This is where indexes come into play, just the way you can use an index page in a book to locate your chapters, you can use `loc()` or `iloc()` function in panadas to access data in particular columns of your DataFrame.

In [24]:
df2 = pd.DataFrame([[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8, 9]], index = ['a', 'b', 'c', 'd'], columns = ['x', 'y', 'z'])

In [25]:
print(df2)

   x  y  z
a  1  2  3
b  3  4  5
c  5  6  7
d  7  8  9
