# Pandas
NumPy is not just a library, its a whole ecosystem of a number of libraries, including **Pandas**.

Whenever we have to deal with the series or tabular data, Pandas is the right deal for us.

Pandas treats 1D data as a **`Series`** and 2D as a **`DataFrame`**.

Lets begin with the Series data.

## Series

A series can hold any collection. Its syntax is:

**`series = pandas.Series(data)`**

Lets begin first by importing pandas:

In [1]:
import pandas as pd

### From a list

Let's start by make a series using a list.

In [2]:
listA = [1,3,5,7]
listA

[1, 3, 5, 7]

In [3]:
seriesA = pd.Series(listA)
seriesA

0    1
1    3
2    5
3    7
dtype: int64

As we can see, the series uses linear indices (0,1,2,...) vs the values. This is little bit similar to dictionary, but here indices are: A-) automatic B-) following an order.

We can further see the difference by comparing the types:

In [4]:
print(type(listA))
print(type(seriesA))

<class 'list'>
<class 'pandas.core.series.Series'>


### Using a Tuple

Similarly, we can make a series using a tuple as well:

In [5]:
first20Odd = (x for x in range(1,41,2))
tupleA = tuple(first20Odd)
tupleA
#print(first20) Uncomment it to check the first20's type

(1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39)

**Important Note:** Before proceeding further, we need to remember that if we use **`[]`** or **`{}`** around the `x for x in range(1,21)`, it makes a list and a set respectively, but when we use parentheses, it doesn't make a tuple. It just makes a generator object, which can be used to make any collection (list, tuple or a set).

Let's come back to the topic and make the series.

In [6]:
seriesB = pd.Series(tupleA)
seriesB

0      1
1      3
2      5
3      7
4      9
5     11
6     13
7     15
8     17
9     19
10    21
11    23
12    25
13    27
14    29
15    31
16    33
17    35
18    37
19    39
dtype: int64

### From a Set

Similarly, we can convert a set into a series as well.... Can we?

In [7]:
setA = {2, -2, 0, 19}

seriesC = pd.Series(setA)

seriesC

TypeError: ignored

**Ouch!** What went wrong here? Just like we can't use indexing with the sets (unordered), it doesn't make any sense to make a series either (as it uses indices).

### From a Dictionary

It all makes to not have a series for the sets. Let's check it for dictionaries now.

In [8]:
dictionaryProducts = {401:"7UP",780:"Mars",302:"Dawn Bread",307:"Almarai Cheese",412:"Nadec Juice"}

seriesD = pd.Series(dictionaryProducts)

seriesD

401               7UP
780              Mars
302        Dawn Bread
307    Almarai Cheese
412       Nadec Juice
dtype: object

Yes! Pandas' Series is just an inspired form of the primitive dictionaries. If its a dictionary, it allows us to have the explicit keys, while for the other types, it provides a default indexing (0-onwards).

**Question:** Can we use explicit keys for other collections as well?




### For Hybrid Collections

Lets try it a bit more by checking it on the hybrid collections

**1. List within a List**

**2. Tuple within a List**

**3. Set within a List**

**Note:** For the sake of ease, we would mark every series as `x`.

In [9]:
listList = [[10,20],[30,40],[50,100]]

x = pd.Series(listList)

x

0     [10, 20]
1     [30, 40]
2    [50, 100]
dtype: object

Apparently, it works. Let's try the same for the other two.



In [10]:
listTuple = [(10,20),(30,40)]

x = pd.Series(listTuple)

x

0    (10, 20)
1    (30, 40)
dtype: object

Similarly,

In [11]:
listTuple = [{10,20},{30,40}]

x = pd.Series(listTuple)

x

0    {10, 20}
1    {40, 30}
dtype: object

Lets try the same for the 3 permutations of the tuple.

In [12]:
tupleTuple = ((10,20),(30,60))

x = pd.Series(tupleTuple)

tupleList = ([10,20],[30,60])

y = pd.Series(tupleList)

tupleSet = ({10,20},{30,60})

z = pd.Series(tupleSet)

Good. Very good so far. We know so far that series doesn't allow sets. Is it true for sets nested with other collections too? Let's see.

In [13]:
setTuple = {(10,20),(20,30)}

x = pd.Series(setTuple)

TypeError: ignored

But Python (or Pandas specifically) is having none of it. Let's round it off by checking it for NumPy arrays.

### From NumPy arrays

In [14]:
import numpy as np

In [15]:
numPyA = np.array(tupleA)

numPyA

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,
       35, 37, 39])

In [16]:
x = pd.Series(numPyA)
x

0      1
1      3
2      5
3      7
4      9
5     11
6     13
7     15
8     17
9     19
10    21
11    23
12    25
13    27
14    29
15    31
16    33
17    35
18    37
19    39
dtype: int64

### Series Attributes

Series method has some attributes too, which we can use for further customization (if required).

#### `index`

As we wondered above that can we use explicit indices with the series, well short answer is yes. We can do it using the `index` attribute.

Since we are overriding the existing indices, so we have to set them explicitly, which can be a cumbersome task (especially when series is long).

Anyways, syntax is straightforward:

`s.index = <index list>`

In [17]:
x.index = [10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200]

x

10      1
20      3
30      5
40      7
50      9
60     11
70     13
80     15
90     17
100    19
110    21
120    23
130    25
140    27
150    29
160    31
170    33
180    35
190    37
200    39
dtype: int64

As we can see, series has updated indices.

**Note:** Number of indices (i.e, the list's length) should be equal to the number of elements. A mismatch will lead to an error.

#### `name`

Similarly, we can even name a series.

In [18]:
x.name = "Odd numbers"

x

10      1
20      3
30      5
40      7
50      9
60     11
70     13
80     15
90     17
100    19
110    21
120    23
130    25
140    27
150    29
160    31
170    33
180    35
190    37
200    39
Name: Odd numbers, dtype: int64

## DataFrame

Series is good for linear data, but for 2D data, we prefer dataframes.

There are a number of ways of making a dataframe.

### From Collections

We can make dataframes from a couple of collections (or more).

In [25]:
list1 = [[1,2,3],[4,5,6]]

df4 = pd.DataFrame(list1)

df4

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6


Similarly, for tuples:

In [26]:
tupleList1 = [(1,2),(3,4),(4,5)]

df5 = pd.DataFrame(tupleList1)

df5

Unnamed: 0,0,1
0,1,2
1,3,4
2,4,5


#### For Hybrid Data

Can we make a dataframe of lists containing both numbers and strings? Let's try.

In [27]:
list2 = [[2,3,4],["abc","def","ghi"]]

df6 = pd.DataFrame(list2)

df6

Unnamed: 0,0,1,2
0,2,3,4
1,abc,def,ghi


#### For Sets

Can we make it from sets?

In [29]:
set1 = {[2,3,4],["abc","def","ghi"]}

df6 = pd.DataFrame(set1)

df6

TypeError: ignored

No surprise here. It would have made no sense had it allowed the unordered sets there.

### From Series

Since a series is just like a vector (a column vector), we can merge two (or more) series to have 2D data as well.

We can simply merge two Series as a list:

`seriesList = [series1,series2]`

In [19]:
series1 = pd.Series([1,2,3])
series2 = pd.Series([2,4,6])

seriesList = [series1,series2]

seriesList

[0    1
 1    2
 2    3
 dtype: int64,
 0    2
 1    4
 2    6
 dtype: int64]

Now we can convert it simply into a dataframe as:

`pd.DataFrame(<seriesList>)`

In [20]:
df1 = pd.DataFrame(seriesList)

df1

Unnamed: 0,0,1,2
0,1,2,3
1,2,4,6


We can also use tuples or sets to merge two Series, what will be behaviour then?

In [21]:
seriesTuple = (series1,series2)
#seriesSet = {series1,series2} #won't work as its a set

df2 = pd.DataFrame(seriesTuple)
df2
#df3 = pd.DataFrame(seriesSet)

Unnamed: 0,0,1,2
0,1,2,3
1,2,4,6


#### Using Dictionaries to merge Series

Apparently, using tuple or a list is more or less same, but these `0`, `1`, `2` rarely make any sense as the column headers. So we can use ids for them.

And which collection type allows us to have an id for the data? Bingo!

We will make a dictionary as:

**`<seriesDictionary> = {<"key1">:<series1>,<"key2">:<series2>}`**

In [22]:
seriesDictionary = {"Temperature":series1,"Pressure":series2}

Now we can convert it simply into a dataframe as:

`pd.DataFrame(<seriesDictionary>)`

In [23]:
df3 = pd.DataFrame(seriesDictionary)

df3

Unnamed: 0,Temperature,Pressure
0,1,2
1,2,4
2,3,6


### From CSV Files

We can also import a CSV file from the folders. Firstly, we will import the CSV and then we will convert it into the Dataframe.

Its syntax is:

**`x = pd.read_csv(<path relative to the notebook location>)`**

Lets go to the **UCI repository** and download any CSV dataset, say [Students Success](https://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success)


---To be continued---

## DataFrame Operations

Pandas has a huge world of functions and applications. Making the dataframe is just a start (its just like making a table in the DB). We can perform a number of operations on it and this is where the game begins.

---