# Day 3 â€“ Working with CSV Files in Google Colab (Environment Debugging + EDA)

## ðŸŽ¯ Objective
Learn how to properly upload, locate, verify, and load raw CSV data in Google Colab using pandas.

---

## ðŸ”¹ Task 1 â€“ Understanding the Colab Environment

Google Colab runs on a **remote virtual machine**, not on my local computer.

Because of this:
- Files must exist inside `/content/`
- `pd.read_csv("Pokemon_data.csv")` only works if the file exists in the active session


-STEP ONE: Understand Colab's enviroment -Google Colab runs on a **remote virtual machine**, not your PC therefore df=pd.read_csv("Pokemon_data.csv") will not work

In [None]:
import pandas as pd

df=pd.read_csv("Pokemon_data.csv")

print(df)

IsADirectoryError: [Errno 21] Is a directory: 'Pokemon_data.csv'




-STEP TWO: Check if the file is actually uploaded by running os.listdir()



---

## ðŸ”¹ Task 2 â€“ Verifying File Location

Used:

```python
import os
os.listdir()

In [17]:
import os
os.listdir()

['.config', 'Pokemon_data.csv', 'sample_data']

Pokemon_data.csv is in the list, which could be accessed. This was uploaded manually.

In [None]:
df = pd.read_csv("Pokemon_data.csv")
df.head()

IsADirectoryError: [Errno 21] Is a directory: 'Pokemon_data.csv'

Explaination for this error message: The error indicates that Pokemon_data.csv is a directory, not a file. When you uploaded the file using files.upload(), it created a directory with the same name and placed the file inside it. Therefore, the correct path to the CSV file is Pokemon_data.csv/Pokemon_data.csv.

In [None]:
from google.colab import files
uploaded = files.upload("Pokemon_data.csv")

Saving Pokemon_data.csv to Pokemon_data.csv/Pokemon_data.csv


**I uploaded the file but the issue is I currently have a folder name "Pokemon_data.csv" and inside it a file also named "Pokemone_data.csv"**

In [None]:
import pandas as pd

df=pd.read_csv("Pokemon_data.csv/Pokemon_data.csv")

print(df)

      No        Name    Type1   Type2  Height  Weight  Legendary
0      1   Bulbasaur    Grass  Poison     0.7     6.9          0
1      2     Ivysaur    Grass  Poison     1.0    13.0          0
2      3    Venusaur    Grass  Poison     2.0   100.0          0
3      4  Charmander     Fire     NaN     0.6     8.5          0
4      5  Charmeleon     Fire     NaN     1.1    19.0          0
..   ...         ...      ...     ...     ...     ...        ...
145  146     Moltres     Fire  Flying     2.0    60.0          1
146  147     Dratini   Dragon     NaN     1.8     3.3          0
147  148   Dragonair   Dragon     NaN     4.0    16.5          0
148  149   Dragonite   Dragon  Flying     2.2   210.0          0
149  150      Mewtwo  Psychic     NaN     2.0   122.0          1

[150 rows x 7 columns]


**THIS IS THE FOLDER**

In [None]:
!ls

Pokemon_data.csv  sample_data


**THIS IS THE FILE**

In [None]:
!ls Pokemon_data.csv

Pokemon_data.csv


---

## ðŸ”¹ Task 3 â€“ Cleaning the Environment

Used:

```python
!rm -rf Pokemon_data.csv
``````
-Then re-uploaded the file properly

Verified again using:

```python
os.listdir()
```````
-Final clean load:

```python
df = pd.read_csv("Pokemon_data.csv")

In [5]:
#Delete the folder
!rm -rf Pokemon_data.csv

In [6]:
#Upload again
from google.colab import files
files.upload()

Saving Pokemon_data.csv to Pokemon_data.csv


{'Pokemon_data.csv': b"No,Name,Type1,Type2,Height,Weight,Legendary\r\n1,Bulbasaur,Grass,Poison,0.7,6.9,0\r\n2,Ivysaur,Grass,Poison,1,13,0\r\n3,Venusaur,Grass,Poison,2,100,0\r\n4,Charmander,Fire,,0.6,8.5,0\r\n5,Charmeleon,Fire,,1.1,19,0\r\n6,Charizard,Fire,Flying,1.7,90.5,0\r\n7,Squirtle,Water,,0.5,9,0\r\n8,Wartortle,Water,,1,22.5,0\r\n9,Blastoise,Water,,1.6,85.5,0\r\n10,Caterpie,Bug,,0.3,2.9,0\r\n11,Metapod,Bug,,0.7,9.9,0\r\n12,Butterfree,Bug,Flying,1.1,32,0\r\n13,Weedle,Bug,Poison,0.3,3.2,0\r\n14,Kakuna,Bug,Poison,0.6,10,0\r\n15,Beedrill,Bug,Poison,1,29.5,0\r\n16,Pidgey,Normal,Flying,0.3,1.8,0\r\n17,Pidgeotto,Normal,Flying,1.1,30,0\r\n18,Pidgeot,Normal,Flying,1.5,39.5,0\r\n19,Rattata,Normal,,0.3,3.5,0\r\n20,Raticate,Normal,,0.7,18.5,0\r\n21,Spearow,Normal,Flying,0.3,2,0\r\n22,Fearow,Normal,Flying,1.2,38,0\r\n23,Ekans,Poison,,2,6.9,0\r\n24,Arbok,Poison,,3.5,65,0\r\n25,Pikachu,Electric,,0.4,6,0\r\n26,Raichu,Electric,,0.8,30,0\r\n27,Sandshrew,Ground,,0.6,12,0\r\n28,Sandslash,Ground,,1,29

In [7]:
#verify
import os
os.listdir()

['.config', 'Pokemon_data.csv', 'sample_data']

In [12]:
import pandas as pd

#test load normally
df = pd.read_csv("Pokemon_data.csv")
df.head()

Unnamed: 0,No,Name,Type1,Type2,Height,Weight,Legendary
0,1,Bulbasaur,Grass,Poison,0.7,6.9,0
1,2,Ivysaur,Grass,Poison,1.0,13.0,0
2,3,Venusaur,Grass,Poison,2.0,100.0,0
3,4,Charmander,Fire,,0.6,8.5,0
4,5,Charmeleon,Fire,,1.1,19.0,0


**TASK 4:** As a DE we always want to verify if we have a folder or a file. Need to create different methods -Debugging stage

---

## ðŸ”¹ Task 4 â€“ File vs Folder Verification (Data Engineering Debugging)

Used:

**Method 1- Python Logic**
```python
os.path.isdir()
os.path.isfile()
``````
**Method 2- Terminal Style (Common in DE)**
```python
!ls -l
```````
**Method 3- Pro-Level Print Format **

```python
print(item, "->", "Folder" if os.path.isdir(item) else "File")

In [10]:
#Method 1 - Using if statement
import os

for item in os.listdir():
  if os.path.isdir(item):
    print(item, "Folder")
  elif os.path.isfile(item):
    print(item, "File")

os.listdir()

.config Folder
Pokemon_data.csv File
sample_data Folder


['.config', 'Pokemon_data.csv', 'sample_data']

In [13]:
#Method #2- Quick Terminal Style (Common in DE) d = dir "Folder", - = file "file"
!ls -l

total 12
-rw-r--r-- 1 root root 4859 Feb 19 22:19 Pokemon_data.csv
drwxr-xr-x 1 root root 4096 Jan 16 14:24 sample_data


In [14]:
#Method #3 - Pro-Level

import os

for item in os.listdir():
    print(item, "->", "Folder" if os.path.isdir(item) else "File")
os.listdir()

.config -> Folder
Pokemon_data.csv -> File
sample_data -> Folder


['.config', 'Pokemon_data.csv', 'sample_data']

**TASK 5:** Do basic data exploration steps also called (**EDA - EXPLORATORY DATA ANALYSIS**)

---

## ðŸ”¹ Task 5 â€“ Basic EDA (Exploratory Data Analysis)

After loading the dataset:

âœ” Dataset size â†’ df.shape

âœ” Column structure â†’ df.columns

âœ” Data types & nulls â†’ df.info()

âœ” Filtering â†’ Fire-type PokÃ©mon

âœ” Conditional filtering â†’ Height > 5.0

âœ” Aggregation â†’ Average weight by Type

âœ” Full data preview â†’ df.to_string()

In [15]:
import pandas as pd

df = pd.read_csv("Pokemon_data.csv")

# ------- adding \n + TITLE to get a better print statement

#1. Dataset size
print("\nDataset Size:")
print(df.shape)

#2. Columns
print("\nColumns:")
print(df.columns)

#3. info
print("\nInfo:")
df.info()

#4. Filter fire Pokemon
print("\nFire Pokemon:")
fire = df[df["Type1"] == "Fire"]
print(fire.head())

#5 Tallest Pokemon (> 5.0 Height)
print("\nTallest Pokemon:")
tall = df[df["Height"] > 5.0]
print(tall.head())

#6 Average Weight by Type
print("\nAverage Weight by Type:")
print(df.groupby("Type1")["Weight"].mean())


Dataset Size:
(150, 7)

Columns:
Index(['No', 'Name', 'Type1', 'Type2', 'Height', 'Weight', 'Legendary'], dtype='object')

Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   No         150 non-null    int64  
 1   Name       150 non-null    object 
 2   Type1      150 non-null    object 
 3   Type2      67 non-null     object 
 4   Height     150 non-null    float64
 5   Weight     150 non-null    float64
 6   Legendary  150 non-null    int64  
dtypes: float64(2), int64(2), object(3)
memory usage: 8.3+ KB

Fire Pokemon:
    No        Name Type1   Type2  Height  Weight  Legendary
3    4  Charmander  Fire     NaN     0.6     8.5          0
4    5  Charmeleon  Fire     NaN     1.1    19.0          0
5    6   Charizard  Fire  Flying     1.7    90.5          0
36  37      Vulpix  Fire     NaN     0.6     9.9          0
37  38   Ninetales  Fire     

In [16]:
import pandas as pd

df = pd.read_csv("Pokemon_data.csv")

print(df.to_string()) #.to_string() shows the entire data

      No        Name     Type1     Type2  Height  Weight  Legendary
0      1   Bulbasaur     Grass    Poison     0.7     6.9          0
1      2     Ivysaur     Grass    Poison     1.0    13.0          0
2      3    Venusaur     Grass    Poison     2.0   100.0          0
3      4  Charmander      Fire       NaN     0.6     8.5          0
4      5  Charmeleon      Fire       NaN     1.1    19.0          0
5      6   Charizard      Fire    Flying     1.7    90.5          0
6      7    Squirtle     Water       NaN     0.5     9.0          0
7      8   Wartortle     Water       NaN     1.0    22.5          0
8      9   Blastoise     Water       NaN     1.6    85.5          0
9     10    Caterpie       Bug       NaN     0.3     2.9          0
10    11     Metapod       Bug       NaN     0.7     9.9          0
11    12  Butterfree       Bug    Flying     1.1    32.0          0
12    13      Weedle       Bug    Poison     0.3     3.2          0
13    14      Kakuna       Bug    Poison     0.6