# Session 5 — File Operations, Pandas, Numpy

# File Operations in Python

A text file supports three main actions:

1. Read  
2. Write  
3. Append  

## File Modes
- r  → Read  
- w  → Write (overwrites)  
- a  → Append  
- r+ → Read + Write  
- a+ → Append + Read  

We use the `with open()` syntax for safe file handling.



In [1]:
with open("sample.txt", "w") as f:
    f.write("Hello\n")
    f.write("This is Python file handling\n")
    f.write("Writing using write()\n")


## Reading Complete File

`read()` returns all the content as one string.


In [7]:
with open("sample.txt", "r") as f:
    content = f.read()
    print(content)


Hello
This is Python file handling
Writing using write()



## Reading Line by Line

Using a for-loop automatically reads each line.
`strip()` removes newline and spaces.


In [8]:
with open("sample.txt", "r") as f:
    for line in f:
        print(line.strip())


Hello
This is Python file handling
Writing using write()


## Reading First N Characters


In [9]:
with open("sample.txt", "r") as f:
    print(f.read(10))   # reads first 10 characters


Hello
This


## Using readline()

Each `readline()` call reads one line only.

The cursor moves automatically to next line.


In [12]:
with open("sample.txt", "r") as f:
    print(f.readline())  # line 1
    print(f.readline())  # line 2
    print(f.readline())  # line 3


Hello

This is Python file handling

Writing using write()



## Using readlines()

Returns all lines as a list.


In [14]:
with open("sample.txt", "r") as f:
    lines = f.readlines()
    print(lines)


['Hello\n', 'This is Python file handling\n', 'Writing using write()\n']


In [15]:
clean = [line.strip() for line in lines]
clean


['Hello', 'This is Python file handling', 'Writing using write()']

## Append to a File

`a` mode adds new lines without deleting old content.


In [16]:
with open("sample.txt", "a") as f:
    f.write("This is appended line\n")
    f.write("Another appended line\n")


## Copying Content from One File to Another

This follows ETL logic:
- Extract → Read  
- Transform → (optional)  
- Load → Write to new file  


In [17]:
with open("sample.txt", "r") as src:
    with open("copy.txt", "w") as dest:
        for line in src:
            dest.write(line)


## Using seek() to Insert Text at a Specific Location

`seek(pos)` moves the cursor to a specific position.

`r+` mode allows reading + writing without deleting entire file.


In [19]:
with open("sample.txt", "r+") as f:
    f.seek(12)              # move cursor to position 12
    f.write("InsertedText") # overwrite from position 12


# Execl Handeling
# Excel File Operations

We use pandas for structured data (columns and rows).


In [21]:
import pandas as pd

df = pd.read_excel("students.xlsx")
df


Unnamed: 0,Name,Marks
0,Alice,85
1,Bob,90
2,Charlie,78
3,ruchik,87
4,sushma,11
5,Thauja,44


## Writing DataFrame to Excel


In [23]:
data = {
    "Name": ["Akhil", "Riya", "Tanisha"],
    "Marks": [78, 82, 44]
}

df = pd.DataFrame(data)
df.to_excel("students.xlsx", index=False)


# CSV Operations using pandas


In [None]:
import pandas as pd

# Load Titanic dataset from seaborn repository
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
titanic = pd.read_csv(url)

# Create a smaller version with 100 rows
titanic_sample100 = titanic.head(100)

# Save it locally
titanic_sample100.to_csv("titanic_sample100.csv", index=False)

titanic_sample100.head()


In [41]:
titanic = pd.read_csv("titanic_sample100.csv")
titanic.head()


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## View Data
- head(n)
- tail(n)
- dtypes


In [42]:
titanic.head(10)
titanic.tail(10)
titanic.dtypes


PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

## Summary Statistics


In [43]:
titanic.describe()
titanic.describe(include="object")


Unnamed: 0,Name,Sex,Ticket,Cabin,Embarked
count,100,100,100,20,99
unique,100,2,97,19,3
top,"Braund, Mr. Owen Harris",male,19950,C23 C25 C27,S
freq,1,61,2,2,69


## Column Slicing Examples


In [45]:
titanic["Ticket"].head()
titanic["Ticket"][4:10]
titanic["Ticket"][4:10:2]


4    373450
6     17463
8    347742
Name: Ticket, dtype: object

## Checking Missing Values


In [46]:
titanic["Age"].isnull().sum()


np.int64(22)

## Replace Cabin With Only First Character


In [47]:
titanic["Cabin"] = titanic["Cabin"].astype(str)
titanic["Cabin"] = titanic["Cabin"].str[0]


## Create New Column: Family Size


In [48]:
titanic["Family"] = titanic["SibSp"] + titanic["Parch"]
titanic[["SibSp", "Parch", "Family"]].head()


Unnamed: 0,SibSp,Parch,Family
0,1,0,1
1,1,0,1
2,0,0,0
3,1,0,1
4,0,0,0


# NumPy Basics


In [50]:
import numpy as np

arr = np.array([10, 20, 30])
arr


array([10, 20, 30])

## 2D Matrix


In [52]:
mat = np.array([[10, 20], [30, 40]])
mat


array([[10, 20],
       [30, 40]])

## Generating Random Matrix


In [53]:
np.random.seed(1)
data = np.random.randn(5, 4)
data


array([[ 1.62434536, -0.61175641, -0.52817175, -1.07296862],
       [ 0.86540763, -2.3015387 ,  1.74481176, -0.7612069 ],
       [ 0.3190391 , -0.24937038,  1.46210794, -2.06014071],
       [-0.3224172 , -0.38405435,  1.13376944, -1.09989127],
       [-0.17242821, -0.87785842,  0.04221375,  0.58281521]])

# loc vs iloc

iloc → index positions  
loc  → label names  


In [54]:
df = pd.DataFrame(data, index=list("ABCDE"), columns=list("WXYZ"))

df.iloc[2]       # third row
df.loc["C"]      # row C
df.loc["C", "W"] # specific value


np.float64(0.31903909605709857)

## Reset Index
Creates new 0,1,2... index and pushes old index into a column.


In [55]:
df.reset_index(inplace=True)
df


Unnamed: 0,index,W,X,Y,Z
0,A,1.624345,-0.611756,-0.528172,-1.072969
1,B,0.865408,-2.301539,1.744812,-0.761207
2,C,0.319039,-0.24937,1.462108,-2.060141
3,D,-0.322417,-0.384054,1.133769,-1.099891
4,E,-0.172428,-0.877858,0.042214,0.582815
