# Pandas - DataFrame and Series

Pandas is a powerful data manipulation library in Python, widely used for data-analysis and data cleaning. It provides two main data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axis. (rows and columns)

In [1]:
%pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import pandas as pd

In [3]:
data = [1,2,3,4,5]

series = pd.Series(data)
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [None]:
## Create Series from dictionary (key becomes an index)
data = {'a':1, 'b':2, 'c':3}

dict_series = pd.Series(data)
dict_series

a    1
b    2
c    3
dtype: int64

In [None]:
data = [10, 20, 30]
index = ['a', 'b', 'c']
pd.Series(data=data, index=index)

a    10
b    20
c    30
dtype: int64

DataFrame

In [None]:
## Create a DataFrame from a dictionary
data={
    'Name': ["Azim", "Alim", "Aziz"],
    'Age': [19, 19, 21],
    'City': ['Tashkent', 'Chicago', 'Samarkand']
}

df = pd.DataFrame(data=data)

print(df)
print(type(df))

   Name  Age       City
0  Azim   19   Tashkent
1  Alim   19    Chicago
2  Aziz   21  Samarkand
<class 'pandas.core.frame.DataFrame'>


In [9]:
## Create a DataFrame from a list of dictionaries

data=[
    {'Name': 'Azimjon', 'Age': 19, 'City': 'Tashkent'},
    {'Name': 'Bobir', 'Age': 20, 'City': 'New York'},
    {'Name': 'Aziz', 'Age': 21, 'City': 'Tashkent'}
]

df = pd.DataFrame(data=data)
df

Unnamed: 0,Name,Age,City
0,Azimjon,19,Tashkent
1,Bobir,20,New York
2,Aziz,21,Tashkent


In [11]:
type(df['Name'])

pandas.core.series.Series

In [14]:
df.loc[0][0]

  df.loc[0][0]


'Azimjon'

In [None]:
## Index-based access
df.iloc[0][0]

  df.iloc[0][0]


'Azimjon'

In [None]:
## Accesing a specified element
df.at[0, 'Age']

np.int64(19)

In [23]:
df['Salary']=[10000, 4000, 3000]

In [25]:
df.drop('Salary', axis=1, inplace=True)

In [30]:
df['Age'] = df['Age'] + 1

df

Unnamed: 0,Name,Age,City
0,Azimjon,24,Tashkent
1,Bobir,25,New York
2,Aziz,26,Tashkent


In [33]:
#Describe the types of each column
print("Data types:\n", df.dtypes)

#Describe the dataframe
print("Statistical summary:\n", df.describe())

#Group by a column and perform an aggregation
grouped = df.groupby('Age')['Age'].mean()
print("Mean value by category:\n", grouped)

Data types:
 Name    object
Age      int64
City    object
dtype: object
Statistical summary:
         Age
count   3.0
mean   25.0
std     1.0
min    24.0
25%    24.5
50%    25.0
75%    25.5
max    26.0
Mean value by category:
 Age
24    24.0
25    25.0
26    26.0
Name: Age, dtype: float64
