# Pandas-DataFrame And Series

Pandas is a powerful data manipulation library in python, widely used for data analysis and data cleaning. It provides two primary data structures: DataFrame and Series. A Series is a one-dimensional array-like object that can hold any data type, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [2]:
import pandas as pd

In [3]:
## Series
## Pandas Series is a one-dimensional labeled array capable of holding any data type. It similar to a column in a spreadsheet or a SQL table.

import pandas as pd
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)
print(type(series))

0    10
1    20
2    30
3    40
4    50
dtype: int64
<class 'pandas.core.series.Series'>


In [4]:
## Create a series from a dictionary
data = {'a': 10, 'b': 20, 'c': 30}
series_dict = pd.Series(data)
print(series_dict)

a    10
b    20
c    30
dtype: int64


In [5]:
data = [10, 20, 30]
index = ['a', 'b', 'c']
series_custom_index = pd.Series(data, index=index)
print(series_custom_index)

a    10
b    20
c    30
dtype: int64


In [6]:
## DataFrame : A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table, or a dictionary of Series objects.
## Create a DataFrame from a dictionary of lists

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
print(type(df))

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
<class 'pandas.core.frame.DataFrame'>


In [7]:
import numpy as np

In [8]:
## Create a data frame from a lsits of dictionaries
data = [
    {'Name': 'Alice', 'Age': 25, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
    {'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}
]

df_dict = pd.DataFrame(data)
print(df_dict)
print(type(df_dict))

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
<class 'pandas.core.frame.DataFrame'>


In [9]:
df = pd.read_csv('movies.csv')
print(df)
print(type(df))

    Movie_ID                  Title      Genre  Rating  Revenue_Million  Year  \
0          1       The Dark Horizon   Thriller     6.9              621  1986   
1          2      The Golden Legacy     Action     6.1              945  1994   
2          3      The Crimson Dream  Animation     7.5             1889  2017   
3          4       The Broken Storm   Thriller     6.5             1428  1986   
4          5      The Furious Storm    Mystery     6.9              227  2009   
5          6        The Dark Shadow   Thriller     8.9             2583  2003   
6          7       The Silent Dream   Thriller     9.4             1003  1986   
7          8     The Furious Empire    Romance     7.2             2795  1997   
8          9        The Lost Shadow     Sci-Fi     6.6             1604  1997   
9         10    The Furious Horizon     Sci-Fi     8.9             1342  2005   
10        11       The Golden Storm     Sci-Fi     8.3             1670  2021   
11        12         The Los

In [10]:
df.head(5) # Display the first 5 rows of the DataFrame

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country
0,1,The Dark Horizon,Thriller,6.9,621,1986,171,Martin Scorsese,UK
1,2,The Golden Legacy,Action,6.1,945,1994,149,Ridley Scott,USA
2,3,The Crimson Dream,Animation,7.5,1889,2017,120,Christopher Nolan,Canada
3,4,The Broken Storm,Thriller,6.5,1428,1986,96,Patty Jenkins,UK
4,5,The Furious Storm,Mystery,6.9,227,2009,153,Greta Gerwig,Australia


In [11]:
df.tail(5) # Display the last 5 rows of the DataFrame

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country
45,46,The Lost Empire,Horror,6.1,1410,2006,170,Quentin Tarantino,India
46,47,The Lost River,Horror,9.1,1977,1994,110,Jordan Peele,Japan
47,48,The Hidden Dream,Sci-Fi,6.1,841,2005,127,Steven Spielberg,UK
48,49,The Hidden Storm,Animation,7.4,2246,2001,88,Greta Gerwig,India
49,50,The Lost Echo,Thriller,6.1,2493,2007,129,James Cameron,Australia


In [12]:
### Accessing data from from dataFrame
data

[{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
 {'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
 {'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}]

In [13]:
df

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country
0,1,The Dark Horizon,Thriller,6.9,621,1986,171,Martin Scorsese,UK
1,2,The Golden Legacy,Action,6.1,945,1994,149,Ridley Scott,USA
2,3,The Crimson Dream,Animation,7.5,1889,2017,120,Christopher Nolan,Canada
3,4,The Broken Storm,Thriller,6.5,1428,1986,96,Patty Jenkins,UK
4,5,The Furious Storm,Mystery,6.9,227,2009,153,Greta Gerwig,Australia
5,6,The Dark Shadow,Thriller,8.9,2583,2003,158,Quentin Tarantino,UK
6,7,The Silent Dream,Thriller,9.4,1003,1986,133,Steven Spielberg,Germany
7,8,The Furious Empire,Romance,7.2,2795,1997,174,Greta Gerwig,Italy
8,9,The Lost Shadow,Sci-Fi,6.6,1604,1997,166,Martin Scorsese,France
9,10,The Furious Horizon,Sci-Fi,8.9,1342,2005,119,Greta Gerwig,France


In [14]:
data

[{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
 {'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
 {'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}]

In [15]:
df

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country
0,1,The Dark Horizon,Thriller,6.9,621,1986,171,Martin Scorsese,UK
1,2,The Golden Legacy,Action,6.1,945,1994,149,Ridley Scott,USA
2,3,The Crimson Dream,Animation,7.5,1889,2017,120,Christopher Nolan,Canada
3,4,The Broken Storm,Thriller,6.5,1428,1986,96,Patty Jenkins,UK
4,5,The Furious Storm,Mystery,6.9,227,2009,153,Greta Gerwig,Australia
5,6,The Dark Shadow,Thriller,8.9,2583,2003,158,Quentin Tarantino,UK
6,7,The Silent Dream,Thriller,9.4,1003,1986,133,Steven Spielberg,Germany
7,8,The Furious Empire,Romance,7.2,2795,1997,174,Greta Gerwig,Italy
8,9,The Lost Shadow,Sci-Fi,6.6,1604,1997,166,Martin Scorsese,France
9,10,The Furious Horizon,Sci-Fi,8.9,1342,2005,119,Greta Gerwig,France


In [16]:
df['Title'] # Accessing a single column

0          The Dark Horizon
1         The Golden Legacy
2         The Crimson Dream
3          The Broken Storm
4         The Furious Storm
5           The Dark Shadow
6          The Silent Dream
7        The Furious Empire
8           The Lost Shadow
9       The Furious Horizon
10         The Golden Storm
11           The Lost Dream
12       The Eternal Empire
13         The Golden River
14          The Dark Shadow
15    The Shattered Horizon
16         The Golden Dream
17         The Furious Code
18         The Golden River
19       The Crimson Empire
20        The Hidden Legacy
21           The Dark Storm
22         The Silent Dream
23      The Shattered Dream
24         The Broken Dream
25         The Silent River
26         The Crimson Code
27        The Crimson River
28          The Broken Code
29           The Broken Sky
30          The Eternal Sky
31          The Silent Echo
32           The Lost River
33          The Golden Echo
34          The Furious Sky
35          The Hidd

In [17]:
type(df['Title'])

pandas.core.series.Series

In [18]:
df.loc[0] # Accessing a single row by label

Movie_ID                          1
Title              The Dark Horizon
Genre                      Thriller
Rating                          6.9
Revenue_Million                 621
Year                           1986
Runtime_Min                     171
Director            Martin Scorsese
Country                          UK
Name: 0, dtype: object

In [19]:
df.iloc[0] # Accessing a single row by position

Movie_ID                          1
Title              The Dark Horizon
Genre                      Thriller
Rating                          6.9
Revenue_Million                 621
Year                           1986
Runtime_Min                     171
Director            Martin Scorsese
Country                          UK
Name: 0, dtype: object

In [20]:
## Accessing a specificed elements
df.at[1, 'Title'] # Accessing a specific element by label

'The Golden Legacy'

In [21]:
## Accessing a specificed element using iat
df.iat[2, 2] # Accessing a specific element by position

'Animation'

In [22]:
## Data Manupulation with dataFrame
df

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country
0,1,The Dark Horizon,Thriller,6.9,621,1986,171,Martin Scorsese,UK
1,2,The Golden Legacy,Action,6.1,945,1994,149,Ridley Scott,USA
2,3,The Crimson Dream,Animation,7.5,1889,2017,120,Christopher Nolan,Canada
3,4,The Broken Storm,Thriller,6.5,1428,1986,96,Patty Jenkins,UK
4,5,The Furious Storm,Mystery,6.9,227,2009,153,Greta Gerwig,Australia
5,6,The Dark Shadow,Thriller,8.9,2583,2003,158,Quentin Tarantino,UK
6,7,The Silent Dream,Thriller,9.4,1003,1986,133,Steven Spielberg,Germany
7,8,The Furious Empire,Romance,7.2,2795,1997,174,Greta Gerwig,Italy
8,9,The Lost Shadow,Sci-Fi,6.6,1604,1997,166,Martin Scorsese,France
9,10,The Furious Horizon,Sci-Fi,8.9,1342,2005,119,Greta Gerwig,France


In [23]:
## Adding a new column
df['Watched'] = (['Yes', 'No'] * 45)[:len(df)]
df

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country,Watched
0,1,The Dark Horizon,Thriller,6.9,621,1986,171,Martin Scorsese,UK,Yes
1,2,The Golden Legacy,Action,6.1,945,1994,149,Ridley Scott,USA,No
2,3,The Crimson Dream,Animation,7.5,1889,2017,120,Christopher Nolan,Canada,Yes
3,4,The Broken Storm,Thriller,6.5,1428,1986,96,Patty Jenkins,UK,No
4,5,The Furious Storm,Mystery,6.9,227,2009,153,Greta Gerwig,Australia,Yes
5,6,The Dark Shadow,Thriller,8.9,2583,2003,158,Quentin Tarantino,UK,No
6,7,The Silent Dream,Thriller,9.4,1003,1986,133,Steven Spielberg,Germany,Yes
7,8,The Furious Empire,Romance,7.2,2795,1997,174,Greta Gerwig,Italy,No
8,9,The Lost Shadow,Sci-Fi,6.6,1604,1997,166,Martin Scorsese,France,Yes
9,10,The Furious Horizon,Sci-Fi,8.9,1342,2005,119,Greta Gerwig,France,No


In [24]:
## Remova a column
df.drop('Watched', axis=1)

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country
0,1,The Dark Horizon,Thriller,6.9,621,1986,171,Martin Scorsese,UK
1,2,The Golden Legacy,Action,6.1,945,1994,149,Ridley Scott,USA
2,3,The Crimson Dream,Animation,7.5,1889,2017,120,Christopher Nolan,Canada
3,4,The Broken Storm,Thriller,6.5,1428,1986,96,Patty Jenkins,UK
4,5,The Furious Storm,Mystery,6.9,227,2009,153,Greta Gerwig,Australia
5,6,The Dark Shadow,Thriller,8.9,2583,2003,158,Quentin Tarantino,UK
6,7,The Silent Dream,Thriller,9.4,1003,1986,133,Steven Spielberg,Germany
7,8,The Furious Empire,Romance,7.2,2795,1997,174,Greta Gerwig,Italy
8,9,The Lost Shadow,Sci-Fi,6.6,1604,1997,166,Martin Scorsese,France
9,10,The Furious Horizon,Sci-Fi,8.9,1342,2005,119,Greta Gerwig,France


In [25]:
df

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country,Watched
0,1,The Dark Horizon,Thriller,6.9,621,1986,171,Martin Scorsese,UK,Yes
1,2,The Golden Legacy,Action,6.1,945,1994,149,Ridley Scott,USA,No
2,3,The Crimson Dream,Animation,7.5,1889,2017,120,Christopher Nolan,Canada,Yes
3,4,The Broken Storm,Thriller,6.5,1428,1986,96,Patty Jenkins,UK,No
4,5,The Furious Storm,Mystery,6.9,227,2009,153,Greta Gerwig,Australia,Yes
5,6,The Dark Shadow,Thriller,8.9,2583,2003,158,Quentin Tarantino,UK,No
6,7,The Silent Dream,Thriller,9.4,1003,1986,133,Steven Spielberg,Germany,Yes
7,8,The Furious Empire,Romance,7.2,2795,1997,174,Greta Gerwig,Italy,No
8,9,The Lost Shadow,Sci-Fi,6.6,1604,1997,166,Martin Scorsese,France,Yes
9,10,The Furious Horizon,Sci-Fi,8.9,1342,2005,119,Greta Gerwig,France,No


In [26]:
df.drop('Watched', axis=1, inplace=True)

In [27]:

df

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country
0,1,The Dark Horizon,Thriller,6.9,621,1986,171,Martin Scorsese,UK
1,2,The Golden Legacy,Action,6.1,945,1994,149,Ridley Scott,USA
2,3,The Crimson Dream,Animation,7.5,1889,2017,120,Christopher Nolan,Canada
3,4,The Broken Storm,Thriller,6.5,1428,1986,96,Patty Jenkins,UK
4,5,The Furious Storm,Mystery,6.9,227,2009,153,Greta Gerwig,Australia
5,6,The Dark Shadow,Thriller,8.9,2583,2003,158,Quentin Tarantino,UK
6,7,The Silent Dream,Thriller,9.4,1003,1986,133,Steven Spielberg,Germany
7,8,The Furious Empire,Romance,7.2,2795,1997,174,Greta Gerwig,Italy
8,9,The Lost Shadow,Sci-Fi,6.6,1604,1997,166,Martin Scorsese,France
9,10,The Furious Horizon,Sci-Fi,8.9,1342,2005,119,Greta Gerwig,France


In [28]:
## Add Rating to the column
df['Rating'] = df['Rating'] + 1
df

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country
0,1,The Dark Horizon,Thriller,7.9,621,1986,171,Martin Scorsese,UK
1,2,The Golden Legacy,Action,7.1,945,1994,149,Ridley Scott,USA
2,3,The Crimson Dream,Animation,8.5,1889,2017,120,Christopher Nolan,Canada
3,4,The Broken Storm,Thriller,7.5,1428,1986,96,Patty Jenkins,UK
4,5,The Furious Storm,Mystery,7.9,227,2009,153,Greta Gerwig,Australia
5,6,The Dark Shadow,Thriller,9.9,2583,2003,158,Quentin Tarantino,UK
6,7,The Silent Dream,Thriller,10.4,1003,1986,133,Steven Spielberg,Germany
7,8,The Furious Empire,Romance,8.2,2795,1997,174,Greta Gerwig,Italy
8,9,The Lost Shadow,Sci-Fi,7.6,1604,1997,166,Martin Scorsese,France
9,10,The Furious Horizon,Sci-Fi,9.9,1342,2005,119,Greta Gerwig,France


In [29]:
df = pd.read_csv('movies.csv')
df.head(5)

Unnamed: 0,Movie_ID,Title,Genre,Rating,Revenue_Million,Year,Runtime_Min,Director,Country
0,1,The Dark Horizon,Thriller,6.9,621,1986,171,Martin Scorsese,UK
1,2,The Golden Legacy,Action,6.1,945,1994,149,Ridley Scott,USA
2,3,The Crimson Dream,Animation,7.5,1889,2017,120,Christopher Nolan,Canada
3,4,The Broken Storm,Thriller,6.5,1428,1986,96,Patty Jenkins,UK
4,5,The Furious Storm,Mystery,6.9,227,2009,153,Greta Gerwig,Australia


In [30]:
# Display the data tyoes of each column
print('Data types\n', df.dtypes)

# Describe the dataframe
print('\nData description\n', df.describe())

#Group by a column and perform an aggregation
grouped = df.groupby('Genre')['Rating'].mean()
print('\nGrouped by Genre and mean Rating\n', grouped)

Data types
 Movie_ID             int64
Title               object
Genre               object
Rating             float64
Revenue_Million      int64
Year                 int64
Runtime_Min          int64
Director            object
Country             object
dtype: object

Data description
        Movie_ID   Rating  Revenue_Million         Year  Runtime_Min
count  50.00000  50.0000        50.000000    50.000000    50.000000
mean   25.50000   7.6540      1505.060000  2002.200000   127.500000
std    14.57738   1.0801       733.281927    13.114877    26.813224
min     1.00000   6.1000       227.000000  1980.000000    88.000000
25%    13.25000   6.7000       956.750000  1988.500000   105.250000
50%    25.50000   7.5000      1441.000000  2004.000000   121.000000
75%    37.75000   8.6750      2211.000000  2013.750000   149.750000
max    50.00000   9.4000      2795.000000  2025.000000   180.000000

Grouped by Genre and mean Rating
 Genre
Action       7.271429
Adventure    7.166667
Animation    7.

In [31]:
df.describe()

Unnamed: 0,Movie_ID,Rating,Revenue_Million,Year,Runtime_Min
count,50.0,50.0,50.0,50.0,50.0
mean,25.5,7.654,1505.06,2002.2,127.5
std,14.57738,1.0801,733.281927,13.114877,26.813224
min,1.0,6.1,227.0,1980.0,88.0
25%,13.25,6.7,956.75,1988.5,105.25
50%,25.5,7.5,1441.0,2004.0,121.0
75%,37.75,8.675,2211.0,2013.75,149.75
max,50.0,9.4,2795.0,2025.0,180.0
