# Introduction to Pandas: Series and DataFrames

The Pandas Series and DataFrames are some of the core elements you need for Data Analysis with Python and the Pandas library. You can both use them for data reading, storing, modifying, and more. If you want to know more about Series and DataFrames, let's jump right into it.

## What are Series and What are DataFrames

Let's quickly identify first what is a `series` and what is a `data frame`. Both of them are datasets, they just have different shapes. They aren't the same, but they are very related.

The Series (`pandas.Series`) are datasets with 1-dimensional shapes. It is more likely an array or a list in Python. It can store any kind of object, and the cool thing is that you can customize its index with a different set of numbers (`int` or `float`) or you can also use strings (`str`). 

On the other hand, DataFrames (`pandas.DataFrames`) are 2-dimensional datasets. It has rows and columns which is very useful when creating a table. As I said, series and data frames are related to each other, it is because each column in a data frame is a Series. 

## Using the Series

Let's import the `pandas` module first, then use `pd.Series()` to create a series.

In [1]:
import pandas as pd

In [2]:
sr = pd.Series([1, 23, 34, 24, 51, 15])

sr

0     1
1    23
2    34
3    24
4    51
5    15
dtype: int64

Calling our variable `sr` will return us a Series with a default numeric index. In able to change the index, we can use the `index` argument when creating the Series or after the Series was made.

In [3]:
# customizing after a series was created
sr.index = ['q','w','e','r','t','y']

sr

q     1
w    23
e    34
r    24
t    51
y    15
dtype: int64

In [4]:
# customizing index while creating
sr = pd.Series(
    [1, 2, 3, 4, 5, 6],
    index=['Q', 'W', 'E', 'R', 'T', 'Y'],
    name="Number List"
)

sr

Q    1
W    2
E    3
R    4
T    5
Y    6
Name: Number List, dtype: int64

You can also name your Series. A Series name acts as a column name since DataFrame columns are actually Series.

By simply calling the variable `sr`, we can read the whole Series. In terms of reading a specific cell of an object, we can call it using an index like how we do in ordinary Python.

In [5]:
sr["Q"]

1

You can also read objects using a range of indexes.

In [6]:
sr["Q":"T"]

Q    1
W    2
E    3
R    4
T    5
Name: Number List, dtype: int64

In [7]:
# other way of calling with index is using the .loc method
sr.loc["Q":"R"]

Q    1
W    2
E    3
R    4
Name: Number List, dtype: int64

We can still access object cells using a numeric index with the `.iloc` method.

In [8]:
sr.iloc[2:-1]

E    3
R    4
T    5
Name: Number List, dtype: int64

Using conditions with our Series, we can get a boolean series as an output. We can use this to read objects conditionally.

In [9]:
sr >= 4

Q    False
W    False
E    False
R     True
T     True
Y     True
Name: Number List, dtype: bool

In [10]:
sr[sr >= 4]

R    4
T    5
Y    6
Name: Number List, dtype: int64

To add another object in your series, you can add an object like how we do it in Python dictionaries.

In [11]:
sr["U"] = 78

sr

Q     1
W     2
E     3
R     4
T     5
Y     6
U    78
Name: Number List, dtype: int64

You can also add/merge a Series with another Series using `.append()`.

In [12]:
sr.append(pd.Series([33,44,55]))

Q     1
W     2
E     3
R     4
T     5
Y     6
U    78
0    33
1    44
2    55
dtype: int64

As you notice, the index of the older objects remains the same (strings form) and the newly merged objects have a default numeric index value. The new objects got their indices from the old Series they're in, so if we modify their index and append it to other Series, the objects will retain their old index.

In [13]:
sr.append(pd.Series([66,77,88], index=["I","O","P"]))

Q     1
W     2
E     3
R     4
T     5
Y     6
U    78
I    66
O    77
P    88
dtype: int64

However, we can reset the index of both Series by using the `ignore_index` argument.

In [14]:
sr.append(pd.Series([55,56,67], index=["a","s","d"]), ignore_index=True)

0     1
1     2
2     3
3     4
4     5
5     6
6    78
7    55
8    56
9    67
dtype: int64

> Note: The modifications like `append` won't be saved automatically. If we call our Series, we will see there's nothing changed. 

In [15]:
sr

Q     1
W     2
E     3
R     4
T     5
Y     6
U    78
Name: Number List, dtype: int64

In [53]:
# saving the operations we did
sr = sr.append(pd.Series([55,56,67], index=["a","s","d"]), ignore_index=True)

sr

0     1
1     2
2     3
3     4
4     5
5     6
6    78
7    55
8    56
9    67
dtype: int64

## Using DataFrames

For dealing with a bigger set of data/objects with multiple columns, we can use DataFrames.

In [16]:
df = pd.DataFrame({
    "Age": [21,19,22,20,23,23,21],
    "Sex": ["M","M","F","M","F","F","M"],
    "GPA": [3.45,2.98,3.72,2.87,3.90,4.00,1.90]
}, index=["James", "Mark", "Rebecca", "David", "Lucy", "Judy", "Johnny"])

df

Unnamed: 0,Age,Sex,GPA
James,21,M,3.45
Mark,19,M,2.98
Rebecca,22,F,3.72
David,20,M,2.87
Lucy,23,F,3.9
Judy,23,F,4.0
Johnny,21,M,1.9


Calling our DataFrame or Series with `.head()` will return the first rows of our dataset. Passing an `int` to this function will return the first `nth` rows, but will return the first 5 rows as default (if you don't pass a number). However, if you want to access the last rows, you can use the `.tail()` function.

In [17]:
# default .head() or .tail() will return 5 rows
df.head()

Unnamed: 0,Age,Sex,GPA
James,21,M,3.45
Mark,19,M,2.98
Rebecca,22,F,3.72
David,20,M,2.87
Lucy,23,F,3.9


In [18]:
# adding a number as an argument
df.tail(3)

Unnamed: 0,Age,Sex,GPA
Lucy,23,F,3.9
Judy,23,F,4.0
Johnny,21,M,1.9


Like the Series, in order to access objects using a string index, we can use the `.loc` function, and for the numeric index, you can use `.iloc`.

In [19]:
# single accesing
df.loc["Judy"]

Age    23
Sex     F
GPA     4
Name: Judy, dtype: object

In [27]:
# accessing multiple rows through range
df.iloc[1:-1]

Unnamed: 0,Age,Sex,GPA,Is Passed
Mark,19,M,2.98,False
Rebecca,22,F,3.72,True
David,20,M,2.87,False
Lucy,23,F,3.9,True
Judy,23,F,4.0,True


You can also specify what columns you only want to pick from them.

In [28]:
# return columns mentioned only
df.loc["David":"Johnny", ["Age", "Sex"]]

Unnamed: 0,Age,Sex
David,20,M
Lucy,23,F
Judy,23,F
Johnny,21,M


And if we can pick a row, we can also drop a row.

In [29]:
df.drop(["James", "Mark"])

Unnamed: 0,Age,Sex,GPA,Is Passed
Rebecca,22,F,3.72,True
David,20,M,2.87,False
Lucy,23,F,3.9,True
Judy,23,F,4.0,True
Johnny,21,M,1.9,False


Another thing you can use to access rows/objects is by passing a condition to the DataFrame.

In [40]:
# the condition we want will return a boolean value
df["GPA"] >= 3.0

James       True
Mark       False
Rebecca     True
David      False
Lucy        True
Judy        True
Johnny     False
Name: GPA, dtype: bool

In [41]:
# the condition will return the data of only those who meet our condition
df[df["GPA"] >= 3.0]

Unnamed: 0,Age,Sex,GPA,Is Passed
James,21,M,3.45,True
Rebecca,22,F,3.72,True
Lucy,23,F,3.9,True
Judy,23,F,4.0,True


To add another column in our DataFrame, we're going to use Series. Let's create another row for our DataFrame using the condition we used earlier.

In [32]:
# creating new column out on an existing column
is_passed = pd.Series(df["GPA"] >= 3.0, index=df.index)

is_passed

James       True
Mark       False
Rebecca     True
David      False
Lucy        True
Judy        True
Johnny     False
Name: GPA, dtype: bool

In [33]:
# adding the column to our DataFrame
df["Is Passed"] = is_passed

df

Unnamed: 0,Age,Sex,GPA,Is Passed
James,21,M,3.45,True
Mark,19,M,2.98,False
Rebecca,22,F,3.72,True
David,20,M,2.87,False
Lucy,23,F,3.9,True
Judy,23,F,4.0,True
Johnny,21,M,1.9,False


And if you ever want to rename a column or an index, let's say you put a wrong index or want to change the name of a column, you can use the `.rename()` function.

In [34]:
df.rename(
    columns = {
        "Is Passed": "Passed GPA"
    },
    index = {
        "James": "J. Doe",
        "Mark": "M. Villar",
        "David": "D. Martinez",
    }
)

Unnamed: 0,Age,Sex,GPA,Passed GPA
J. Doe,21,M,3.45,True
M. Villar,19,M,2.98,False
Rebecca,22,F,3.72,True
D. Martinez,20,M,2.87,False
Lucy,23,F,3.9,True
Judy,23,F,4.0,True
Johnny,21,M,1.9,False


Remember, all the changes won't be saved automatically unless you save it into a variable (or update its old variable).

In [35]:
df

Unnamed: 0,Age,Sex,GPA,Is Passed
James,21,M,3.45,True
Mark,19,M,2.98,False
Rebecca,22,F,3.72,True
David,20,M,2.87,False
Lucy,23,F,3.9,True
Judy,23,F,4.0,True
Johnny,21,M,1.9,False


One last thing, Pandas Series and DataFrames have more cool features. You can run `pandas.Series?` or `pandas.DataFrame?` to quickly view the documentation.

So there you have it! With Series and DataFrames in your toolkit, you've got the muscle to handle data like a pro. Whether you're diving into data for work, play, or sheer curiosity, Pandas has your back. I hope you find this blog helpful, thanks