# Pandas Tutorial: Basics

Welcome to the basics Pandas tutorial. In this section, we will cover the fundamentals of Pandas, a powerful library for data analysis and manipulation in Python. We'll start with Series and DataFrames, the core data structures in Pandas, and explore how to manipulate them.


In [21]:
# Importing Pandas
import pandas as pd
# Import numpy as complement
import numpy as np


## Creating Series

A Series is a one-dimensional array-like object containing a sequence of values and an associated array of data labels, called its index.


In [22]:
# Creating a Series
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)


0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64


## Creating DataFrames

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).


In [23]:
# Creating a DataFrame by passing a NumPy array
dates = pd.date_range('20230101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)


                   A         B         C         D
2023-01-01 -0.841290  0.081727  0.470027  3.203913
2023-01-02 -0.550541  1.016935 -0.443175 -0.858145
2023-01-03  1.010446 -0.335146 -1.036723 -0.136153
2023-01-04  1.413318 -0.131444 -0.196092  0.353596
2023-01-05 -0.331854 -1.414046  1.002417  1.955026
2023-01-06 -1.666298  1.272363 -0.961934 -0.022752


## Viewing Data

Let's take a look at the top and bottom rows of the frame.


In [24]:
# Viewing the top rows of the DataFrame
print(df.head())

# Viewing the bottom rows of the DataFrame
print(df.tail(3))


                   A         B         C         D
2023-01-01 -0.841290  0.081727  0.470027  3.203913
2023-01-02 -0.550541  1.016935 -0.443175 -0.858145
2023-01-03  1.010446 -0.335146 -1.036723 -0.136153
2023-01-04  1.413318 -0.131444 -0.196092  0.353596
2023-01-05 -0.331854 -1.414046  1.002417  1.955026
                   A         B         C         D
2023-01-04  1.413318 -0.131444 -0.196092  0.353596
2023-01-05 -0.331854 -1.414046  1.002417  1.955026
2023-01-06 -1.666298  1.272363 -0.961934 -0.022752


## Basic Operations

Pandas provides numerous operations for data manipulation, such as filtering, sorting, and grouping.


In [25]:
# Descriptive statistics
print(df.describe())

# Sorting by an axis
print(df.sort_index(axis=1, ascending=False))

# Sorting by values
print(df.sort_values(by='B'))


              A         B         C         D
count  6.000000  6.000000  6.000000  6.000000
mean  -0.161036  0.081732 -0.194247  0.749247
std    1.162791  0.974555  0.804281  1.523119
min   -1.666298 -1.414046 -1.036723 -0.858145
25%   -0.768603 -0.284221 -0.832244 -0.107803
50%   -0.441198 -0.024858 -0.319633  0.165422
75%    0.674871  0.783133  0.303497  1.554668
max    1.413318  1.272363  1.002417  3.203913
                   D         C         B         A
2023-01-01  3.203913  0.470027  0.081727 -0.841290
2023-01-02 -0.858145 -0.443175  1.016935 -0.550541
2023-01-03 -0.136153 -1.036723 -0.335146  1.010446
2023-01-04  0.353596 -0.196092 -0.131444  1.413318
2023-01-05  1.955026  1.002417 -1.414046 -0.331854
2023-01-06 -0.022752 -0.961934  1.272363 -1.666298
                   A         B         C         D
2023-01-05 -0.331854 -1.414046  1.002417  1.955026
2023-01-03  1.010446 -0.335146 -1.036723 -0.136153
2023-01-04  1.413318 -0.131444 -0.196092  0.353596
2023-01-01 -0.841290  0.0

## Handling Missing Data

Pandas provides various methods for cleaning and filling missing data.


In [26]:
# Making a copy of the first DataFrame with missing values
df_missing = df.copy()
df_missing.at[dates[0], 'A'] = np.nan

# Filling missing data
print(df_missing.fillna(value=5))

# Dropping any rows that have missing data
print(df_missing.dropna(how='any'))


                   A         B         C         D
2023-01-01  5.000000  0.081727  0.470027  3.203913
2023-01-02 -0.550541  1.016935 -0.443175 -0.858145
2023-01-03  1.010446 -0.335146 -1.036723 -0.136153
2023-01-04  1.413318 -0.131444 -0.196092  0.353596
2023-01-05 -0.331854 -1.414046  1.002417  1.955026
2023-01-06 -1.666298  1.272363 -0.961934 -0.022752
                   A         B         C         D
2023-01-02 -0.550541  1.016935 -0.443175 -0.858145
2023-01-03  1.010446 -0.335146 -1.036723 -0.136153
2023-01-04  1.413318 -0.131444 -0.196092  0.353596
2023-01-05 -0.331854 -1.414046  1.002417  1.955026
2023-01-06 -1.666298  1.272363 -0.961934 -0.022752


## Merging

Pandas provides multiple ways to combine DataFrames, including merging and concatenation.


In [27]:
# Concatenating Pandas objects
df1 = pd.DataFrame(np.random.randn(3, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(2, 4), columns=['A', 'B', 'C', 'D'])

df_concatenated = pd.concat([df1, df2])
print(df_concatenated)


          A         B         C         D
0 -0.392429 -0.578467 -0.305895 -0.975671
1  0.121913 -1.121945  1.831200  0.501353
2 -2.199546 -0.697106  0.415638 -0.360534
0 -1.286876 -0.544024 -0.894942 -0.027968
1  1.675917  2.864225  1.151580 -1.287241


## Applying Functions

Applying functions to the data is essential for data analysis.


In [28]:
# Applying a function to each column
print(df.apply(np.cumsum))

# Applying a lambda function to each element
print(df.apply(lambda x: x.max() - x.min()))


                   A         B         C         D
2023-01-01 -0.841290  0.081727  0.470027  3.203913
2023-01-02 -1.391831  1.098663  0.026852  2.345768
2023-01-03 -0.381385  0.763517 -1.009870  2.209615
2023-01-04  1.031932  0.632073 -1.205962  2.563211
2023-01-05  0.700079 -0.781974 -0.203545  4.518236
2023-01-06 -0.966219  0.490389 -1.165480  4.495484
A    3.079615
B    2.686409
C    2.039140
D    4.062058
dtype: float64


This concludes our Pandas tutorials, where we explored from the basic to more advanced features of Pandas. With these techniques, you can perform sophisticated data analysis and manipulation tasks efficiently.
