# **Shape of Data**
using **`shape`**

In [None]:
import pandas as pd

data_01 = {
    "Name": ["John", "Jane", "Bob", "Alice"],
    "Age": [25, 30, 35, 40],
    "Salary": [50000, 60000, 70000, 80000],
    "Department": ["IT", "HR", "IT", "Finance"]
}
df = pd.DataFrame(data_01)
print(f"Shape of Data : {df.shape}")

Shape of Data : (4, 4)


# **Columns Name**

In [None]:
print(f"Columns Name : {df.columns}")

Columns Name : Index(['Name', 'Age', 'Salary', 'Department'], dtype='object')


# **Load Big Data Set for check shape and No of Columns**

In [None]:
dataPath = "/content/drive/MyDrive/Pandas for Data Analysis/Data/SampleSuperstore.xlsx"

df = pd.read_excel(dataPath)
print(df.head(1))

   Row ID        Order ID Order Date  Ship Date     Ship Mode Customer ID  \
0       1  CA-2016-152156 2016-11-08 2016-11-11  Second Class    CG-12520   

  Customer Name   Segment        Country       City  ... Postal Code  Region  \
0   Claire Gute  Consumer  United States  Henderson  ...       42420   South   

        Product ID   Category Sub-Category                       Product Name  \
0  FUR-BO-10001798  Furniture    Bookcases  Bush Somerset Collection Bookcase   

    Sales  Quantity  Discount   Profit  
0  261.96         2       0.0  41.9136  

[1 rows x 21 columns]


In [None]:
print(f"Shape of Data {df.shape}")

Shape of Data (9994, 21)


In [None]:
print(f"Columns Names of Data: {df.columns}")

Columns Names of Data: Index(['Row ID', 'Order ID', 'Order Date', 'Ship Date', 'Ship Mode',
       'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State',
       'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category',
       'Product Name', 'Sales', 'Quantity', 'Discount', 'Profit'],
      dtype='object')


# **Selecting & Filtering**

## **01- Selecting Columns**

Selecting data in Pandas depends on the syntax you use, which determines whether you get back a **Series** (1D) or a **DataFrame** (2D).

---

### **1. Return a Series (1D)**
* **Bracket Notation:** `df['column_name']`
    * Used for selecting a single column.
    * Returns the data as a Pandas Series.
* **Dot Notation:** `df.column_name`
    * Convenient and quick to type.
    * **Limitations:**
        * Only works if the column name has **no spaces**.
        * Cannot be used if the name conflicts with DataFrame methods (e.g., `count`, `sum`, `mean`).

### **2. Return a DataFrame (2D)**
* **Multiple Columns:** `df[['col_1', 'col_2']]`
    * Requires a **"double bracket"** syntax.
    * The outer brackets are for indexing, while the inner brackets define a **list** of column names.
* **Single Column as DataFrame:** `df[['column_name']]`
    * By passing a single-item list, you force Pandas to retain the DataFrame structure instead of converting it to a Series.


In [6]:
import pandas as pd
data_path = "/content/drive/MyDrive/Pandas for Data Analysis/Data/SampleSuperstore.xlsx"

df = pd.read_excel(data_path)
print(df.columns)

Index(['Row ID', 'Order ID', 'Order Date', 'Ship Date', 'Ship Mode',
       'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State',
       'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category',
       'Product Name', 'Sales', 'Quantity', 'Discount', 'Profit'],
      dtype='object')


In [10]:
# Select one Column
Customers_Names = df["Customer Name"]
print(Customers_Names.head())

0        Claire Gute
1        Claire Gute
2    Darrin Van Huff
3     Sean O'Donnell
4     Sean O'Donnell
Name: Customer Name, dtype: object


In [11]:
# Select two or more columns
Customers_Data = df[["Customer Name", "Country", "City"]]
print(Customers_Data.head())

     Customer Name        Country             City
0      Claire Gute  United States        Henderson
1      Claire Gute  United States        Henderson
2  Darrin Van Huff  United States      Los Angeles
3   Sean O'Donnell  United States  Fort Lauderdale
4   Sean O'Donnell  United States  Fort Lauderdale


# **Filtering Rows**

In [20]:
# Select Customers (City = Henderson)
Customers = df[df["City"] == "Henderson"]
print(Customers.head())

     Row ID        Order ID Order Date  Ship Date       Ship Mode Customer ID  \
0         1  CA-2016-152156 2016-11-08 2016-11-11    Second Class    CG-12520   
1         2  CA-2016-152156 2016-11-08 2016-11-11    Second Class    CG-12520   
538     539  CA-2015-134894 2015-12-07 2015-12-11  Standard Class    DK-12985   
539     540  CA-2015-134894 2015-12-07 2015-12-11  Standard Class    DK-12985   
996     997  CA-2015-162537 2015-10-28 2015-11-03  Standard Class    RD-19585   

      Customer Name   Segment        Country       City  ... Postal Code  \
0       Claire Gute  Consumer  United States  Henderson  ...       42420   
1       Claire Gute  Consumer  United States  Henderson  ...       42420   
538  Darren Koutras  Consumer  United States  Henderson  ...       42420   
539  Darren Koutras  Consumer  United States  Henderson  ...       42420   
996        Rob Dowd  Consumer  United States  Henderson  ...       42420   

     Region       Product ID         Category Sub-Catego

In [23]:
data = {
    "Name": ["Ali", "Bilal", "Ramzan", "Fatima", "Moosa"],
    "Salary": [20000, 40000, 50000, 20000, 30000],
    "Department": ["IT", "HR", "IT", "Finance", "IT"],
    "Age": [25, 30, 35, 40, 45],
    "Gender": ["Male", "Male", "Male", "Female", "Male"],
    "City": ["Karachi", "Lahore", "Islamabad", "Peshawar", "Quetta"],
    "Country": ["Pakistan", "Pakistan", "Pakistan", "Pakistan", "Pakistan"]
}
df = pd.DataFrame(data)
print(df)

     Name  Salary Department  Age  Gender       City   Country
0     Ali   20000         IT   25    Male    Karachi  Pakistan
1   Bilal   40000         HR   30    Male     Lahore  Pakistan
2  Ramzan   50000         IT   35    Male  Islamabad  Pakistan
3  Fatima   20000    Finance   40  Female   Peshawar  Pakistan
4   Moosa   30000         IT   45    Male     Quetta  Pakistan


In [26]:
high_salary = df[(df["Salary"] > 30000)]
print(high_salary)

     Name  Salary Department  Age Gender       City   Country
1   Bilal   40000         HR   30   Male     Lahore  Pakistan
2  Ramzan   50000         IT   35   Male  Islamabad  Pakistan


In [28]:
# Two Conditions (AND Operrator)
high_salary_IT = df[(df["Salary"] > 30000) & (df["Department"] == "IT")]
print(high_salary_IT)

     Name  Salary Department  Age Gender       City   Country
2  Ramzan   50000         IT   35   Male  Islamabad  Pakistan


In [29]:
# Two Conditions (OR Operrator)
City = df[(df["City"] == "Karachi") | (df["City"] == "Lahore")]
print(City)

    Name  Salary Department  Age Gender     City   Country
0    Ali   20000         IT   25   Male  Karachi  Pakistan
1  Bilal   40000         HR   30   Male   Lahore  Pakistan


In [32]:
# Two Conditions (NOT Operrator)
Not_IT = df[~(df["Department"] == "IT")]
print(Not_IT)

     Name  Salary Department  Age  Gender      City   Country
1   Bilal   40000         HR   30    Male    Lahore  Pakistan
3  Fatima   20000    Finance   40  Female  Peshawar  Pakistan
