## Series and DataFrames in Pandas

Pandas is a powerful Python library for data manipulation and analysis. It provides two primary data structures: **Series** and **DataFrame**.

### Series

A **Series** is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, etc.). It is similar to a column in a spreadsheet or a database table.

**Example:**


In [3]:
import pandas as pd

# Example of a Series
data = [10, 20, 30, 40]
series = pd.Series(data, name="Numbers")  # Removed the incorrect 'columns' argument
print("Series example:")
print(series)

# Example of a DataFrame
data_dict = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
}
df = pd.DataFrame(data_dict)
print("\nDataFrame example:")
print(df)

Series example:
0    10
1    20
2    30
3    40
Name: Numbers, dtype: int64

DataFrame example:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


In [12]:
data=pd.read_csv('E:\Code\data.csv')
df=pd.DataFrame(data)
print(df)
df.index=['a','b']
print(df)

    Name  Age         City
0  Alice   30     New York
1    Bob   25  Los Angeles
    Name  Age         City
a  Alice   30     New York
b    Bob   25  Los Angeles


  data=pd.read_csv('E:\Code\data.csv')


In [6]:
#series from dict
d=pd.Series(data_dict,index=['Name','Age'])
print(d)

Name    [Alice, Bob, Charlie]
Age              [25, 30, 35]
dtype: object


In [26]:
#data frame access
d=pd.DataFrame(pd.read_csv('E:/Code/used_car_price_dataset_extended.csv'))
#print(d)
#head(defaultly returns first  5)
#print(df.head())
#tail
#print(d.tail())


4000


  print(d.iloc[0][2])


Rowlevel access

In [28]:
#iloc
print(d.iloc[0][2])
# loc
print(d.loc[0])

4000
make_year                  2001
mileage_kmpl               8.17
engine_cc                  4000
fuel_type                Petrol
owner_count                   4
price_usd               8587.64
brand                 Chevrolet
transmission             Manual
color                     White
service_history             NaN
accidents_reported            0
insurance_valid              No
Name: 0, dtype: object


  print(d.iloc[0][2])


In [29]:
#data manipulation
import pandas as pd

# Creating the DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 35, 40],
    "City": ["New York", "Los Angeles", "Chicago", "Houston"],
    "Profession": ["Engineer", "Doctor", "Artist", "Teacher"]
}

df = pd.DataFrame(data)

# Display the DataFrame
print(df)

      Name  Age         City Profession
0    Alice   25     New York   Engineer
1      Bob   30  Los Angeles     Doctor
2  Charlie   35      Chicago     Artist
3    David   40      Houston    Teacher


In [31]:
df['Salary']=[50000,67000,45000,1000000]
print(df)

      Name  Age         City Profession   Salary
0    Alice   25     New York   Engineer    50000
1      Bob   30  Los Angeles     Doctor    67000
2  Charlie   35      Chicago     Artist    45000
3    David   40      Houston    Teacher  1000000


In [32]:
df.drop_duplicates

<bound method DataFrame.drop_duplicates of       Name  Age         City Profession   Salary
0    Alice   25     New York   Engineer    50000
1      Bob   30  Los Angeles     Doctor    67000
2  Charlie   35      Chicago     Artist    45000
3    David   40      Houston    Teacher  1000000>

In [33]:
df['Salary']+10000000

0    10050000
1    10067000
2    10045000
3    11000000
Name: Salary, dtype: int64

In [35]:
df.describe()

Unnamed: 0,Age,Salary
count,4.0,4.0
mean,32.5,290500.0
std,6.454972,473093.718693
min,25.0,45000.0
25%,28.75,48750.0
50%,32.5,58500.0
75%,36.25,300250.0
max,40.0,1000000.0


In [36]:
df.dtypes

Name          object
Age            int64
City          object
Profession    object
Salary         int64
dtype: object

In [2]:
import pandas as pd
import numpy as np

# Creating a DataFrame with NaN values
data = {'A': [1, 2, np.nan], 'B': [np.nan, 5, 6], 'C': [7, 8, np.nan]}
df1 = pd.DataFrame(data)

print(df1)

     A    B    C
0  1.0  NaN  7.0
1  2.0  5.0  8.0
2  NaN  6.0  NaN


In [3]:
df1.isnull()

Unnamed: 0,A,B,C
0,False,True,False
1,False,False,False
2,True,False,True


In [4]:
df1 = df1.fillna(df.mean(numeric_only=True))
print(df1)

NameError: name 'df' is not defined

In [5]:
df1['C'] = df1['C'].fillna('unknown')
print(df1)

     A    B        C
0  1.0  NaN      7.0
1  2.0  5.0      8.0
2  NaN  6.0  unknown


In [6]:

df1 = df1.rename(columns={'A': 'Ali', 'B': 'Baba', 'C': '40D'})
print(df1.columns)

Index(['Ali', 'Baba', '40D'], dtype='object')


In [7]:
df1.loc[:, 'Ali']  # Raises an error if missing

0    1.0
1    2.0
2    NaN
Name: Ali, dtype: float64

In [8]:
df['Ali'].fillna(df['Ali'].sum(),inplace=True)

NameError: name 'df' is not defined

In [10]:
df1['Ali'].fillna(df1['Ali'].sum(), inplace=True)
print(df1)
df1['Baba'].fillna(df1['Baba'].sum(),inplace=True)
print(df1)

   Ali  Baba      40D
0  1.0  11.0      7.0
1  2.0   5.0      8.0
2  3.0   6.0  unknown
   Ali  Baba      40D
0  1.0  11.0      7.0
1  2.0   5.0      8.0
2  3.0   6.0  unknown


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df1['Baba'].fillna(df1['Baba'].sum(),inplace=True)


In [11]:
df1['40D'] = pd.to_numeric(df1['40D'], errors='coerce')  # Converts non-numeric values to NaN
#df1['40D']=df1.groupby('Baba')['40D'].sum()
print(df1)
df1['40D'].fillna(0, inplace=True)
print(df1)

   Ali  Baba  40D
0  1.0  11.0  7.0
1  2.0   5.0  8.0
2  3.0   6.0  NaN
   Ali  Baba  40D
0  1.0  11.0  7.0
1  2.0   5.0  8.0
2  3.0   6.0  0.0


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df1['40D'].fillna(0, inplace=True)


In [12]:
df1.loc[df1['40D'] == 0, '40D'] = df1['Ali'] + df1['Baba']
print(df1)

   Ali  Baba  40D
0  1.0  11.0  7.0
1  2.0   5.0  8.0
2  3.0   6.0  9.0


In [13]:
df1.get('Ali')
df1['Ali'].dtype

dtype('float64')

In [14]:
df1.groupby('Baba').agg({'40D': 'sum', 'Ali': 'mean'})

Unnamed: 0_level_0,40D,Ali
Baba,Unnamed: 1_level_1,Unnamed: 2_level_1
5.0,8.0,2.0
6.0,9.0,3.0
11.0,7.0,1.0


In [15]:
df1['40D']=df1['40D'].astype(int)