# Pandas Basics
In this chapter, you will see a brief introduction to the Pandas series
and Dataframes, which are two basic data structures for storing data
in Pandas. Next, you will see how to create these data structures and
some basic functions that you can perform with Pandas. You will then
study how to import datasets into a Pandas dataframe using various
input sources. Finally, the chapter concludes with an explanation of
the techniques for handling missing data in Pandas dataframes.

## 1. Pandas Series
A Pandas series is a data structure that stores data in the form of a
column. A series is normally used to store information about a
particular attribute in your dataset. Let’s see how you can create a
series in Pandas.

### 1.1. Creating Pandas Series
There are different ways to create a series with Pandas. The
following script imports the Pandas module and then calls the
Series() class constructor to create an empty series.

In [2]:
# Import the pandas library and give it the alias 'pd'
import pandas as pd

# Create an empty pandas Series object
my_series = pd.Series()

# Print the empty Series to the console
print(my_series)

Series([], dtype: object)


In [3]:
# Import the pandas library and give it the alias 'pd'
import pandas as pd

# Import the numpy library and give it the alias 'np'
import numpy as np

# Create a NumPy array with the values 10, 20, 30, 40, 50
my_array = np.array([10, 20, 30, 40, 50])

# Convert the NumPy array into a Pandas Series
my_series = pd.Series(my_array)

# Print the Pandas Series to the console
print(my_series)

0    10
1    20
2    30
3    40
4    50
dtype: int32


In [4]:
# Create a NumPy array with five integer elements
my_array = np.array([10, 20, 30, 40, 50])

# Create a Pandas Series using the NumPy array
# Assign custom indexes ("num1" to "num5") to the series
my_series = pd.Series(my_array, index=["num1", "num2", "num3", "num4", "num5"])

# Print the Pandas Series to display its data along with custom indexes
print(my_series)

num1    10
num2    20
num3    30
num4    40
num5    50
dtype: int32


In [5]:
# Create a Pandas Series with a list of values [10, 20, 30, 40, 50]
# Assign custom index labels ["num1", "num2", "num3", "num4", "num5"] to each value
my_series = pd.Series([10, 20, 30, 40, 50], index=["num1", "num2", "num3", "num4", "num5"])

# Print the Series to display its contents
print(my_series)


num1    10
num2    20
num3    30
num4    40
num5    50
dtype: int64


In [6]:
# Create a pandas Series with a scalar value 25
# The scalar 25 will be assigned to all the specified index labels
my_series = pd.Series(25, index=["num1", "num2", "num3", "num4", "num5"])

# Print the created Series
print(my_series)

num1    25
num2    25
num3    25
num4    25
num5    25
dtype: int64


In [7]:
# Define a dictionary with three key-value pairs
# Keys are 'num1', 'num2', 'num3', and values are 6, 7, and 8 respectively
my_dict = {
    'num1': 6,
    'num2': 7,
    'num3': 8
}

# Create a Pandas Series from the dictionary
# The dictionary keys become the index of the Series
# The dictionary values become the data of the Series
my_series = pd.Series(my_dict)

# Print the resulting Series to the console
print(my_series)


num1    6
num2    7
num3    8
dtype: int64


### 1.2. Useful Operations on Pandas Series
Let’s see some of the useful operations you can perform with the
Pandas series.
You can use square brackets as well as index labels to access series
items.

In [9]:
# Create a pandas Series with values and custom index labels
my_series = pd.Series(
    [10, 20, 30, 40, 50],
    index=["num1", "num2", "num3", "num4", "num5"]
)

# Access by position using .iloc (avoids FutureWarning)
print(my_series.iloc[0])  # Output: 10

# Access by label stays the same
print(my_series['num3'])  # Output: 30

10
30


In [10]:
# Create a Pandas Series (1D array-like object) with some integer values
my_series = pd.Series([5, 8, 2, 11, 9])

# Find and print the minimum value in the Series using NumPy's min function
print(np.min(my_series))

# Find and print the maximum value in the Series using NumPy's max function
print(np.max(my_series))


2
11


In [11]:
# Create a Pandas Series object with a list of numbers
my_series = pd.Series([5, 8, 2, 11, 9])

# Calculate and print the mean (average) of the values in the Series
print(my_series.mean())


7.0


In [12]:
# Create a Pandas Series (a one-dimensional array) with a list of numbers
my_series = pd.Series([5, 8, 2, 11, 9])

# Calculate and print the median value of the series
print(my_series.median())

8.0


In [13]:
# Create a Pandas Series object from a list of integers
my_series = pd.Series([5, 8, 2, 11, 9])

# Print the data type (dtype) of the elements stored in the Series
print(my_series.dtype)

int64


In [14]:
# Create a Pandas Series from a Python list of integers
my_series = pd.Series([5, 8, 2, 11, 9])

# Convert the Pandas Series to a Python list using the .tolist() method
# and print the resulting list
print(my_series.tolist())

[5, 8, 2, 11, 9]


## 2. Pandas Dataframe
Pandas dataframe is a tabular data structure that stores data in the
form of rows and columns. As a standard, the rows correspond to
records while columns refer to attributes. In simplest words, a Pandas
dataframe is a collection of series.

### 2.1. Creating a Pandas Dataframe
As is the case with a series, there are multiple ways to create a
Pandas dataframe.
To create an empty dataframe, you can use the DataFrame class
from the Pandas module.

In [16]:
# Create an empty pandas DataFrame
my_df = pd.DataFrame()

# Print the empty DataFrame to the console
print(my_df)


Empty DataFrame
Columns: []
Index: []


In [17]:
# Create a list of lists where each sublist contains a subject and its corresponding score
scores = [
    ['Mathematics', 85],
    ['English', 91],
    ['History', 95]
]

# Create a DataFrame from the list of lists
# Set column names as 'Subject' and 'Score'
my_df = pd.DataFrame(scores, columns=['Subject', 'Score'])

# Display the DataFrame
print(my_df)

       Subject  Score
0  Mathematics     85
1      English     91
2      History     95
