<a href="https://colab.research.google.com/github/anujsaxena/Python/blob/main/FileHandling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **File Handling**

When you’re working with Python, you don’t need to import a library in order to read and write files. It’s handled natively in the language, albeit in a unique manner.

The first thing you’ll need to do is use Python’s built-in open function to get a file object. 

The open function opens a file. It’s simple. 

When you use the open function, it returns something called a file object. File objects contain methods and attributes that can be used to collect information about the file you opened. They can also be used to manipulate said file.

For example, the mode attribute of a file object tells you which mode a file was opened in. And the name attribute tells you the name of the file that the file object has opened. 

You must understand that a file and file object are two wholly separate – yet related – things.
 
**File Types**

What you may know as a file is slightly different in Python. 
In Windows, for example, a file can be any item manipulated, edited or created by the user/OS. That means files can be images, text documents, executables, and much more. Most files are organized by keeping them in individual folders. 
In Python, a file is categorized as either text or binary, and the difference between the two file types is important. 
Text files are structured as a sequence of lines, where each line includes a sequence of characters. This is what you know as code or syntax. 
Each line is terminated with a special character, called the EOL or End of Line character. There are several types, but the most common is the comma {,} or newline character. It ends the current line and tells the interpreter a new one has begun. 
A backslash character can also be used, and it tells the interpreter that the next character – following the slash – should be treated as a new line. This character is useful when you don’t want to start a new line in the text itself but in the code. 
A binary file is any type of file that is not a text file. Because of their nature, binary files can only be processed by an application that know or understand the file’s structure. In other words, they must be applications that can read and interpret binary.

**Open ( ) Function**

In order to open a file for writing or use in Python, you must rely on the built-in open () function. 
As explained above, open ( ) will return a file object, so it is most commonly used with two arguments.  
An argument is nothing more than a value that has been provided to a function, which is relayed when you call it. So, for instance, if we declare the name of a file as “Test File,” that name would be considered an argument. 
The syntax to open a file object in Python is:

*File_object = open("filename","mode")*

**Mode**

Including a mode argument is optional because a default value of ‘r’ will be assumed if it is omitted. The ‘r’ value stands for read mode, which is just one of many. 
The modes are: 
1.	‘r’ – Read mode which is used when the file is only being read 
2. ‘w’ – Write mode which is used to edit and write new information to the file (any existing files with the same name will be erased when this mode is activated) 
3.	‘a’ – Appending mode, which is used to add new data to the end of the file; that is new information is automatically appended to the end 
4.	‘r+’ – Special read and write mode, which is used to handle both actions when working with a file 



In [1]:
#write to a text file

f = open("test.txt","w")
f.write("Hello")
f.write("Welcome to Python class")
f.write("Python is fun")
f.write("Python is easy")
f.close()

In [2]:
f = open("test1.txt","w")
f.write("Hello \n")
f.write("Welcome to Python class \n")
f.write("Python is fun \n")
f.write("Python is easy")
f.close()

In [3]:
#read from a text file
f = open("test1.txt","r")
d = f.read()
print(d)

Hello 
Welcome to Python class 
Python is fun 
Python is easy


In [5]:
f = open("test1.txt","r")
d = f.read(5)
print(d)

Hello


In [6]:
f = open("test1.txt","r")
d = f.readline()
print(d)

Hello 



In [14]:
f1 = open("test.txt","r")
d1 = f1.readlines()
print(d1)

['HelloWelcome to Python classPython is funPython is easy']


In [13]:
f1 = open("test1.txt","r")
d1 = f1.readlines(1)
print(d1)

['Hello \n']


In [20]:
f = open("test2.txt","a")
f.writelines(["Hello \n", "Welcome to Python class\n","Python is fun\n", "Python is easy\n"])
f.close()

In [21]:
f1 = open("test2.txt","r")
d1 = f1.readlines()
print(d1)

['Hello \n', 'Welcome to Python class\n', 'Python is fun\n', 'Python is easy\n']


In [23]:
f=open("test2.txt","r")
for txt in f:
	print(txt)


Hello 

Welcome to Python class

Python is fun

Python is easy



In [25]:
f=open("test2.txt","r")
for txt in f:
  wrd = txt.split()
  print(wrd)

['Hello']
['Welcome', 'to', 'Python', 'class']
['Python', 'is', 'fun']
['Python', 'is', 'easy']


In [29]:
with open("test3.txt",'w') as f:
  f.write("Welocme to my python online classes !!!")
  f.write("\n institue de informatica !!!")

In [32]:
with open("test3.txt",'r') as f:
  d=f.readlines()
  print(d)
for txt in d:
  wrd = txt.split()
  print(wrd)

['Welocme to my python online classes !!!\n', ' institue de informatica !!!']
['Welocme', 'to', 'my', 'python', 'online', 'classes', '!!!']
['institue', 'de', 'informatica', '!!!']


# **CSV files**

**Module**

CSV

In CSV module documentation you can find following functions:
1. csv.field_size_limit – return maximum field size
2. csv.get_dialect – get the dialect which is associated with the name
3.	csv.list_dialects – show all registered dialects
4.	csv.reader – read data from a csv file
4.	csv.register_dialect - associate dialect with name
5.	csv.writer – write data to a csv file
6.	csv.unregister_dialect - delete the dialect associated with the name the dialect registry
7.	csv.QUOTE_ALL - Quote everything, regardless of type.
8.	csv.QUOTE_MINIMAL - Quote fields with special characters
9.	csv.QUOTE_NONNUMERIC - Quote all fields that aren't numbers value
10.	csv.QUOTE_NONE – Don't quote anything in output




In [33]:
# importing the csv module 
import csv 
  
# field names 
fields = ['Name', 'Branch', 'Year', 'CGPA'] 
  
# data rows of csv file 
rows = [ ['Nikhil', 'COE', '2021', '9.0'], 
         ['Sanchit', 'COE', '2020', '9.1'], 
         ['Aditya', 'IT', '2021', '9.3'], 
         ['Sagar', 'SE', '2021', '9.5'], 
         ['Prateek', 'MCE', '2020', '7.8'], 
         ['Sahil', 'EP', '2019', '9.1']] 
  
# name of csv file 
filename = "university_records.csv"
  
# writing to csv file 
with open(filename, 'w') as csvfile: 
    # creating a csv writer object 
    csvwriter = csv.writer(csvfile)      
    # writing the headers 
    csvwriter.writerow(fields) 
    # writing the data rows 
    csvwriter.writerows(rows)


In [34]:
#from a dictionary

import csv

# my data rows as dictionary objects 
mydict =[{'branch': 'COE', 'cgpa': '9.0', 'name': 'Nikhil', 'year': '2020'}, 
         {'branch': 'COE', 'cgpa': '9.1', 'name': 'Sanchit', 'year': '2021'}, 
         {'branch': 'IT', 'cgpa': '9.3', 'name': 'Aditya', 'year': '2020'}, 
         {'branch': 'SE', 'cgpa': '9.5', 'name': 'Sagar', 'year': '2021'}, 
         {'branch': 'MCE', 'cgpa': '7.8', 'name': 'Prateek', 'year': '2021'}, 
         {'branch': 'EP', 'cgpa': '9.1', 'name': 'Sahil', 'year': '2019'}] 
  
# field names 
fields = ['name', 'branch', 'year', 'cgpa'] 
  
# name of csv file 
filename = "university_dict.csv"
# writing to csv file 
with open(filename, 'w') as csvfile: 
    # creating a csv dict writer object 
    writer = csv.DictWriter(csvfile, fieldnames = fields)  
    # writing headers (field names) 
    writer.writeheader() 
    # writing data rows 
    writer.writerows(mydict) 



In [35]:
#Read CSV file

import csv
with open( "university_records.csv") as f:
	data = csv.reader(f)
	for row in data:
		print(row)



['Name', 'Branch', 'Year', 'CGPA']
['Nikhil', 'COE', '2021', '9.0']
['Sanchit', 'COE', '2020', '9.1']
['Aditya', 'IT', '2021', '9.3']
['Sagar', 'SE', '2021', '9.5']
['Prateek', 'MCE', '2020', '7.8']
['Sahil', 'EP', '2019', '9.1']


In [38]:
import csv 
  
# csv file name 
filename = "university_records.csv"
  
# initializing the headers and rows list 
fields = [] 
rows = [] 
  
# reading csv file 
with open(filename, 'r') as csvfile: 
    # creating a csv reader object 
    csvreader = csv.reader(csvfile) 
      
    # extracting field names through first row 
    #fields = csvreader.next() 
  
    # extracting each data row one by one 
    for row in csvreader: 
        rows.append(row) 
  
    # get total number of rows 
    print("Total no. of rows: %d"%(csvreader.line_num)) 
  
# printing the field names 
print('Header names are:' + ', '.join(field for field in fields)) 
  
#  printing first 5 rows 
print('\nFirst 5 rows are:\n') 
for row in rows[1:5]: 
    # parsing each column of a row 
    for col in row: 
        print("%10s"%col), 
    print('\n') 


Total no. of rows: 7
Header names are:

First 5 rows are:

    Nikhil
       COE
      2021
       9.0


   Sanchit
       COE
      2020
       9.1


    Aditya
        IT
      2021
       9.3


     Sagar
        SE
      2021
       9.5




In [40]:
import csv
reader = csv.DictReader(open("university_records.csv"))

for txt in reader:
	print(txt)


OrderedDict([('Name', 'Nikhil'), ('Branch', 'COE'), ('Year', '2021'), ('CGPA', '9.0')])
OrderedDict([('Name', 'Sanchit'), ('Branch', 'COE'), ('Year', '2020'), ('CGPA', '9.1')])
OrderedDict([('Name', 'Aditya'), ('Branch', 'IT'), ('Year', '2021'), ('CGPA', '9.3')])
OrderedDict([('Name', 'Sagar'), ('Branch', 'SE'), ('Year', '2021'), ('CGPA', '9.5')])
OrderedDict([('Name', 'Prateek'), ('Branch', 'MCE'), ('Year', '2020'), ('CGPA', '7.8')])
OrderedDict([('Name', 'Sahil'), ('Branch', 'EP'), ('Year', '2019'), ('CGPA', '9.1')])


# **Pandas for CSV files**

In [41]:
import pandas as pd
data=pd.read_csv("university_records.csv", header=0)
print(data)
print("---First row taken as header--------")
data=pd.read_csv("university_records.csv", header=1)
print(data)
#get the shape of the matrix(data)
print(data.shape)
print(data.columns)
print(data.dtypes)



      Name Branch  Year  CGPA
0   Nikhil    COE  2021   9.0
1  Sanchit    COE  2020   9.1
2   Aditya     IT  2021   9.3
3    Sagar     SE  2021   9.5
4  Prateek    MCE  2020   7.8
5    Sahil     EP  2019   9.1
---First row taken as header--------
    Nikhil  COE  2021  9.0
0  Sanchit  COE  2020  9.1
1   Aditya   IT  2021  9.3
2    Sagar   SE  2021  9.5
3  Prateek  MCE  2020  7.8
4    Sahil   EP  2019  9.1
(5, 4)
Index(['Nikhil', 'COE', '2021', '9.0'], dtype='object')
Nikhil     object
COE        object
2021        int64
9.0       float64
dtype: object
