# Analysis with Pandas

In this section of the course we will learn how to use pandas for data analysis.

* Series
* DataFrames
* Missing Data
* GroupBy
* Merging,Joining,and Concatenating
* Operations
* Data Input and Output

## Series

The first main data type we will learn about for pandas is the **Serie**s data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [1]:
import numpy as np
import pandas as pd

In [2]:
# from pandas import Series

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

```py
pd.Series(
    data=None,
    index=None,
    dtype: 'Dtype | None' = None,
    name=None,
    copy: 'bool' = False,
    fastpath: 'bool' = False,
) -> 'None'
Docstring:     
One-dimensional ndarray with axis labels (including time series).

Parameters
----------
data : array-like, Iterable, dict, or scalar value
    Contains data stored in Series. If data is a dict, argument order is
    maintained.
index : array-like or Index (1d)
    Values must be hashable and have the same length as `data`.
    Non-unique index values are allowed. Will default to
    RangeIndex (0, 1, 2, ..., n) if not provided. If data is dict-like
    and index is None, then the keys in the data are used as the index. If the
    index is not None, the resulting Series is reindexed with the index values.


```

In [6]:
labels = ['a','b','c'] # list
my_list = [10,20,30] # list
arr = np.array([10,20,30, 40]) # array
d = {'a':10,'b':20,'c':30} ## dictionary

**Using Lists**

In [7]:
my_list

[10, 20, 30]

In [8]:
pd.Series(my_list)

0    10
1    20
2    30
dtype: int64

In [9]:
labels

['a', 'b', 'c']

In [10]:
pd.Series(my_list, labels)

a    10
b    20
c    30
dtype: int64

In [11]:
pd.Series(labels, my_list)

10    a
20    b
30    c
dtype: object

In [12]:
pd.Series(data=my_list, index=labels)

a    10
b    20
c    30
dtype: int64

In [13]:
labels1 = ['a', 'b', 'c', 'd']

** NumPy Arrays **

In [14]:
arr

array([10, 20, 30, 40])

In [15]:
pd.Series(data=arr, index = [1, 2, 3, 4])

1    10
2    20
3    30
4    40
dtype: int32

In [17]:
pd.Series(arr)

0    10
1    20
2    30
3    40
dtype: int32

In [18]:
pd.Series(labels1, arr)

10    a
20    b
30    c
40    d
dtype: object

In [19]:
## More example with list

In [20]:
my_list.append(10.2)

In [21]:
my_list

[10, 20, 30, 10.2]

In [25]:
pd.Series(my_list, index=labels1)

a    10.0
b    20.0
c    30.0
d    10.2
dtype: float64

In [26]:
pd.Series(index=labels1, data =arr)

a    10
b    20
c    30
d    40
dtype: int32

** Dictionary**

In [27]:
d

{'a': 10, 'b': 20, 'c': 30}

In [29]:
d.keys()

dict_keys(['a', 'b', 'c'])

In [30]:
d.values()

dict_values([10, 20, 30])

In [31]:
## series with dictionary
pd.Series(d)

a    10
b    20
c    30
dtype: int64

### Data in a Series

A pandas Series can hold a variety of object types:

In [32]:
pd.Series(data=labels)

0    a
1    b
2    c
dtype: object

In [33]:
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

## Using an Index


In [35]:
labels2 = ['USA', 'Germany','USSR', 'Japan']

In [36]:
labels2[0]

'USA'

In [37]:
labels2[0:3:2]

['USA', 'USSR']

In [38]:
ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])                                   

In [39]:
ser1['USA':'Japan']

USA        1
Germany    2
USSR       3
Japan      4
dtype: int64

In [52]:
ser6 = pd.Series([1,2 ,3,4])
ser6

0    1
1    2
2    3
3    4
dtype: int64

In [53]:
ser1+ser6

0         NaN
1         NaN
2         NaN
3         NaN
Germany   NaN
Japan     NaN
USA       NaN
USSR      NaN
dtype: float64

In [43]:
ser6[0:3:2]

0    1
2    3
dtype: int64

In [44]:
ser6

0    1
1    2
2    3
3    4
dtype: int64

In [55]:
ser5 = pd.Series([1,2 ,3, 4])
ser5

0    1
1    2
2    3
3    4
dtype: int64

In [56]:
(ser5+ser6)*2

0     4
1     8
2    12
3    16
dtype: int64

In [57]:
my_list

[10, 20, 30, 10.2]

In [58]:
(my_list+my_list)*2

[10, 20, 30, 10.2, 10, 20, 30, 10.2, 10, 20, 30, 10.2, 10, 20, 30, 10.2]

In [59]:
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])                                   

In [60]:
ser1['USA']

1

Operations are then also done based off of index:

In [61]:
ser1 + ser2

Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64