# Pandas series and dataframe

Pandas is a powerful and open-source Python library. The Pandas library is used for data manipulation and analysis. Pandas consist of data structures and functions to perform efficient operations on data.

# Panda Series


A Pandas Series is a one-dimensional array-like structure that holds data of a single data type, along with an associated index. A single column of data is represented by a Pandas Series

In [1]:
import pandas as pd 
student_data = dict(BUSENISS = 25, AI = 30 , JS = 30, JAVA = 27) # number of students

series_programs = pd.Series(student_data)
series_programs

BUSENISS    25
AI          30
JS          30
JAVA        27
dtype: int64

In [47]:
series_programs.iloc[0]

np.int64(25)

In [40]:
series_programs.keys()

Index(['BUSENISS', 'AI', 'JS', 'JAVA'], dtype='object')

#### another series using list

In [2]:
import random as rnd
rnd.seed(42)

dice_list = [rnd.randint(1,6) for _ in range(5)]
dice_list

[6, 1, 1, 6, 3]

In [3]:
dice_series = pd.Series(dice_list)
dice_series

0    6
1    1
2    1
3    6
4    3
dtype: int64

In [4]:
dice_series.min(), dice_series.max(), dice_series.mean()

(np.int64(1), np.int64(6), np.float64(3.4))

# Dataframe


A Pandas DataFrame is a two-dimensional, tabular data structure with rows and columns. Each column in a DataFrame is a Series, and different columns can hold different data types. It extends the concept of Series to multiple columns, each of which can have a different datatype. It also supports complex operations like pivoting, merging, joining, and grouping.

In [5]:
df_programs = pd.DataFrame(series_programs, columns=("Numbers of students",))
df_programs

Unnamed: 0,Numbers of students
BUSENISS,25
AI,30
JS,30
JAVA,27


In [6]:
# create 2 series object using dictionary
students = pd.Series(dict(AI=25, NET=30, APP=30, JAVA=27))
language = pd.Series(dict(AI="Python", NET="C#", APP="Kotlin", JAVA="Java"))



In [7]:
df_programs = pd.DataFrame({"Students": students, "Language": language})
df_programs

Unnamed: 0,Students,Language
AI,25,Python
NET,30,C#
APP,30,Kotlin
JAVA,27,Java


In [73]:
import numpy as np

pd.DataFrame({"Students:": np.array((25,30,30,27)),
              "Language": ["Python","C#","Kotlin","Java"],

              },
              index= ["AI", ".NET", "APP", "Java"],
              )

            

Unnamed: 0,Students:,Language
AI,25,Python
.NET,30,C#
APP,30,Kotlin
Java,27,Java


# Data selection

In [79]:
df_programs["Students"]

AI      25
NET     30
APP     30
JAVA    27
Name: Students, dtype: int64

In [80]:
df_programs[["Language", "Students"]]

Unnamed: 0,Language,Students
AI,Python,25
NET,C#,30
APP,Kotlin,30
JAVA,Java,27


In [82]:
df_programs["Language"]["AI"]

'Python'

# Indexers

In [85]:
df_programs.loc["AI"]

Students        25
Language    Python
Name: AI, dtype: object

In [87]:
df_programs.loc[["JAVA", "APP"]]

Unnamed: 0,Students,Language
JAVA,27,Java
APP,30,Kotlin


In [89]:
df_programs.loc["AI": "APP"]

Unnamed: 0,Students,Language
AI,25,Python
NET,30,C#
APP,30,Kotlin


In [92]:
df_programs.iloc[0]

Students        25
Language    Python
Name: AI, dtype: object

In [93]:
df_programs.iloc[0:2]

Unnamed: 0,Students,Language
AI,25,Python
NET,30,C#


# Masking

In [94]:
df_programs["Students"] > 25

AI      False
NET      True
APP      True
JAVA     True
Name: Students, dtype: bool

In [98]:
df_programs.query("Students > 25")

Unnamed: 0,Students,Language
NET,30,C#
APP,30,Kotlin
JAVA,27,Java
