Pandas is a Python library used for data analysis and manipulation. It provides data structures like Series and DataFrame for handling structured data efficiently.

In [None]:
import pandas as pd


**Creating data:-**


There are two core objects in pandas: the DataFrame and the Series.

**Series**

A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

In [2]:
# Creating a Pandas Series
import pandas as pd

s = pd.Series([10, 20, 30, 40, 50])
print(s)


0    10
1    20
2    30
3    40
4    50
dtype: int64


A Series is, in essence, a single column of a DataFrame. So you can assign row labels to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name:

In [3]:
# Creating a Pandas Series
s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
print(s)


a    10
b    20
c    30
d    40
e    50
dtype: int64


In [None]:
#  Key Functions on Series:
s.head()      # First 5 elements
s.tail()      # Last 5 elements
s.mean()      # Mean of values
s.max()       # Maximum value
s.min()       # Minimum value
s.describe()  # Summary statistics


Unnamed: 0,0
a,10
b,20
c,30
d,40
e,50


In [None]:
s.mean()      # Mean of values

np.float64(30.0)

In [None]:
s.tail()

Unnamed: 0,0
a,10
b,20
c,30
d,40
e,50


**DataFrame**

A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.

For example, consider the following simple DataFrame:

In [None]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In this example, the "0, No" entry has the value of 131. The "0, Yes" entry has a value of 50, and so on.

DataFrame entries are not limited to integers. For instance, here's a DataFrame whose values are strings:

In [None]:
# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000]
}

df = pd.DataFrame(data)
print(df)

      Name  Age  Salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   70000


In [None]:
# Key Functions on DataFrames:

df.head()       # First 5 rows
df.tail()       # Last 5 rows
df.info()       # Summary of DataFrame
df.describe()   # Statistics summary
df.shape        # Get dimensions (rows, columns)
df.columns      # Get column names
df.dtypes       # Data types of columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   Salary  3 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 204.0+ bytes


Unnamed: 0,0
Name,object
Age,int64
Salary,int64


A Pandas DataFrame consists of rows (index) and columns (features). You can select columns using:

1️⃣ Selecting a Single Column

You can select a single column in two ways:

In [None]:
import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000]
}

df = pd.DataFrame(data)

# Selecting a single column using dictionary-like notation
print(df['Age'])  # Returns a Pandas Series

# Selecting a single column using dot notation
print(df.Age)  # Works only if column name has no spaces


0    25
1    30
2    35
Name: Age 1, dtype: int64


Dot notation (df.Age) is convenient, but bracket notation (df['Age']) is safer, especially when column names have spaces.

2️⃣ Selecting Multiple Columns
To select multiple columns, use a list of column names:

In [None]:
df[['Name', 'Salary']]

Unnamed: 0,Name,Salary
0,Alice,50000
1,Bob,60000
2,Charlie,70000


**2. Selecting Rows in a DataFrame**

Pandas provides two main methods to select rows:

Method	Purpose

df.loc[]-------Selects rows by label (row name or index)


df.iloc[]------	Selects rows by position (row number)

1️⃣ Selecting Rows by Index Using loc[]

In [None]:
#df.loc[0]  # Selects the first row (index = 0)
#df.loc[1:2]  # Selects rows with index 1 and 2 (inclusive)
df.loc[[0,2] ] # Selects rows with index 0 and 2


Unnamed: 0,Name,Age,Salary
0,Alice,25,50000
2,Charlie,35,70000


**Key Rule:**

Use single bracket [ ] with : for slicing (range selection).

Use double brackets [[ ]] with commas to select specific rows.

In [None]:
df.iloc[0]  # First row
df.iloc[1:3]  # Rows at positions 1 and 2
df.iloc[[0, 2]]  # Rows at positions 0 and

Unnamed: 0,Name,Age 1,Salary
0,Alice,25,50000
2,Charlie,35,70000


3. Selecting Specific Elements

In [None]:
#You can select specific values using row and column indexes.

df.loc[0, 'Name']  # Selects Name from row index 0

df.iloc[1, 2]  # Selects 2nd row, 3rd column (60000)


60000

4. Selecting Specific Rows & Columns

In [None]:
df.loc[0:1, ['Name', 'Salary']]  # First 2 rows, selected columns
df.iloc[0:2, 1:]  # First 2 rows, columns from index 1 onwards

Unnamed: 0,Age 1,Salary
0,25,50000
1,30,60000


Task	Method

Select single column---	df['Age']

Select multiple columns ---	df[['Name', 'Salary']]

Select row by index ---	df.loc[0]

Select row by position --- df.iloc[0]


Select specific value ---	df.loc[0, 'Name']

Select rows based on condition --- df[df['Age'] > 25]
