## __Reading and Writing CSV Files__

Let's look at how to read and write CSV files. There are two methods that we can use. One is by using the CSV library and the other is by using pandas. 


## Working with CSV
## Step 1: Import the CSV Library and Read the CSV File

Let's explore the CSV library.

- Import the CSV library
- Open the file Sample_File.csv, and mention the filename as argument to the open() function
- Load the file using the csv.reader() method
- Print the contents of the file using a for loop


In [4]:
import csv
csvFile = open('../../Datasets/Sample_dataO.csv')
csvReader = csv.reader(csvFile)

- csvReader is an object of csv.reader()
- Check the type of csvReader object
- This step is not necessary but can be helpful for verifying that the file was loaded correctly

In [5]:
type(csvReader)

_csv.reader

**Observation**

- The type of csvReader is _csv.reader.

Now let us print the contents of the file using a for loop
* Declare a variable that iterates through the csvReader
* Print the line

In [6]:
for line in csvReader:
    print(line)

['name', 'age', 'location', 'phpref']
['john', '35', 'germany', 'iphone']
['mary', '23', 'france', 'motorola']
['johnie', '40', 'germany', 'samsung']
['Aarbhiii', '27', 'italy', 'iphone']
['Gane', '20', 'Romania', 'sony']


## Step 2: Distinguish the Header and the Rows in the CSV File

- Open the CSV file for reading using the **with** and **open()** function
- Read the CSV file using the **reader()** function 
- Initiate the variable to **count** the rows
- Traverse through the lines in csvReader
- Check if the **count** is 0 and print the line as header if it is
- Else, print it as a row
- Increment the **count** by one



In [7]:
with open('../../Datasets/Sample_dataO.csv','r') as csvFile:
    csvReader = csv.reader(csvFile)
    count = 0
    for line in csvReader:
        if count == 0:
            print('Header: '+str(line))
        else:
            print('Row: '+str(line))
        count+=1

Header: ['name', 'age', 'location', 'phpref']
Row: ['john', '35', 'germany', 'iphone']
Row: ['mary', '23', 'france', 'motorola']
Row: ['johnie', '40', 'germany', 'samsung']
Row: ['Aarbhiii', '27', 'italy', 'iphone']
Row: ['Gane', '20', 'Romania', 'sony']


**Observation**
* The first row in a CSV file is always a header. 
* All other lines are rows.

## Step 3: Write a New Row to the CSV File

- Open the CSV file for writing using **with** and **open()** functions
- Create a writer object using the **writer()** function
- Write a new row to the CSV file using the **writerow()** function, and mention the row to be written as an argument to the **writerow()** function


In [8]:
with open('../../Datasets/Sample_dataO.csv','a',newline = '') as csvFile:
    csvWriter = csv.writer(csvFile)
    csvWriter.writerow(['Aarbhiii',27,'italy','iphone'])

In [9]:
with open('../../Datasets/Sample_dataO.csv','r') as csvFile:
    csvReader = csv.reader(csvFile)
    count = 0
    for line in csvReader:
        if count == 0:
            print('Header: '+str(line))
        else:
            print('Row: '+str(line))
        count+=1

Header: ['name', 'age', 'location', 'phpref']
Row: ['john', '35', 'germany', 'iphone']
Row: ['mary', '23', 'france', 'motorola']
Row: ['johnie', '40', 'germany', 'samsung']
Row: ['Aarbhiii', '27', 'italy', 'iphone']
Row: ['Gane', '20', 'Romania', 'sony']
Row: ['Aarbhiii', '27', 'italy', 'iphone']


## Step 4: Use the Pandas Library to Read and Modify CSV Files

- Import the pandas library
- Read the CSV file into a pandas DataFrame object using the read_csv() function



In [10]:
import pandas as pd

df = pd.read_csv('../../Datasets/Sample_dataO.csv')



Print the contents of the DataFrame to the console.
This step is not strictly necessary, but it can be helpful for verifying that the DataFrame was loaded correctly

In [11]:
df

Unnamed: 0,name,age,location,phpref
0,john,35,germany,iphone
1,mary,23,france,motorola
2,johnie,40,germany,samsung
3,Aarbhiii,27,italy,iphone
4,Gane,20,Romania,sony
5,Aarbhiii,27,italy,iphone


**Observation**
* The DataFrame has a tabular structure with rows and columns.
* The first column represents the index of the rows. 

## Step 5: Add a New Row to the DataFrame

To add a new row to the DataFrame:
- Locate the last row index of the DataFrame using the loc[ ] method
- Assign the new row item list to this location



In [12]:

df.loc[len(df.index)] = ['Gane',20,'Romania','sony']

Print the updated contents of the DataFrame to the console:

In [13]:
df

Unnamed: 0,name,age,location,phpref
0,john,35,germany,iphone
1,mary,23,france,motorola
2,johnie,40,germany,samsung
3,Aarbhiii,27,italy,iphone
4,Gane,20,Romania,sony
5,Aarbhiii,27,italy,iphone
6,Gane,20,Romania,sony


**Observation**
* The row is added as the last row of the DataFrame.

## Step 6: Write the Updated DataFrame Back to the CSV File

- Use the to_csv() function to write the contents of the DataFrame to the CSV file


In [14]:
#The "index = False" argument is used to skip writing the index that has been created by pandas internally to the CSV file.
df.to_csv('../../Datasets/Sample_dataO2.csv',index = False)

_More Examples

In [15]:
#creating a file
a = open("../../Datasets/pytestcr.txt","w+")

In [16]:
#Writing data into a file
for i in range(10):
     a.write("This is line %d\r\n" % (i+1))

In [17]:
#closing the file
a.close()

In [18]:
#opening and appending
a = open("../../Datasets/pytestcr.txt","a+")
for i in range(5):
     a.write("This is appended line %d\r\n" % (i+1))
a.close()

In [19]:
#reading
a = open("../../Datasets/pytestcr.txt","r")
if a.mode == 'r':
    contents =a.read()
print(contents)

This is line 1

This is line 2

This is line 3

This is line 4

This is line 5

This is line 6

This is line 7

This is line 8

This is line 9

This is line 10

This is appended line 1

This is appended line 2

This is appended line 3

This is appended line 4

This is appended line 5




In [20]:
#reading file
filepath = '../../Datasets/pytestcr.txt'
with open(filepath) as fp:
   line = fp.readline()
   cnt = 1
   while line:
       print("Line {}: {}".format(cnt, line.strip()))
       line = fp.readline()
       cnt += 1

Line 1: This is line 1
Line 2: 
Line 3: This is line 2
Line 4: 
Line 5: This is line 3
Line 6: 
Line 7: This is line 4
Line 8: 
Line 9: This is line 5
Line 10: 
Line 11: This is line 6
Line 12: 
Line 13: This is line 7
Line 14: 
Line 15: This is line 8
Line 16: 
Line 17: This is line 9
Line 18: 
Line 19: This is line 10
Line 20: 
Line 21: This is appended line 1
Line 22: 
Line 23: This is appended line 2
Line 24: 
Line 25: This is appended line 3
Line 26: 
Line 27: This is appended line 4
Line 28: 
Line 29: This is appended line 5
Line 30: 


In [21]:
#Using pandas and reading data from web instead of local machine
import pandas as pd

#option 1
df = pd.read_csv('https://raw.githubusercontent.com/ajaykuma/Datasets_For_Work/refs/heads/main/auction.csv')
df

Unnamed: 0,auctionid,bid,bidtime,bidder,bidderrate,openbid,price,item,auction_type
0,1638893549,175.00,2.230949,schadenfreud,0.0,99.00,177.50,Cartier wristwatch,3 day auction
1,1638893549,100.00,2.600116,chuik,0.0,99.00,177.50,Cartier wristwatch,3 day auction
2,1638893549,120.00,2.600810,kiwisstuff,2.0,99.00,177.50,Cartier wristwatch,3 day auction
3,1638893549,150.00,2.601076,kiwisstuff,2.0,99.00,177.50,Cartier wristwatch,3 day auction
4,1638893549,177.50,2.909826,eli.flint@flightsafety.co,4.0,99.00,177.50,Cartier wristwatch,3 day auction
...,...,...,...,...,...,...,...,...,...
10676,8214889177,61.00,6.359155,714ark,15.0,0.01,90.01,Xbox game console,7 day auction
10677,8214889177,76.00,6.359294,rjdorman,1.0,0.01,90.01,Xbox game console,7 day auction
10678,8214889177,90.00,6.428738,baylorjeep,3.0,0.01,90.01,Xbox game console,7 day auction
10679,8214889177,88.00,6.760081,jasonjasonparis,18.0,0.01,90.01,Xbox game console,7 day auction


In [22]:
df.columns

Index(['auctionid', 'bid', 'bidtime', 'bidder', 'bidderrate', 'openbid',
       'price', 'item', 'auction_type'],
      dtype='object')

In [23]:
df.head()

Unnamed: 0,auctionid,bid,bidtime,bidder,bidderrate,openbid,price,item,auction_type
0,1638893549,175.0,2.230949,schadenfreud,0.0,99.0,177.5,Cartier wristwatch,3 day auction
1,1638893549,100.0,2.600116,chuik,0.0,99.0,177.5,Cartier wristwatch,3 day auction
2,1638893549,120.0,2.60081,kiwisstuff,2.0,99.0,177.5,Cartier wristwatch,3 day auction
3,1638893549,150.0,2.601076,kiwisstuff,2.0,99.0,177.5,Cartier wristwatch,3 day auction
4,1638893549,177.5,2.909826,eli.flint@flightsafety.co,4.0,99.0,177.5,Cartier wristwatch,3 day auction


In [28]:
#option 2
#without header values
df1 = pd.read_csv('../../Datasets/auction.csv',header=None)
df1.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,auctionid,bid,bidtime,bidder,bidderrate,openbid,price,item,auction_type
1,1638893549,175,2.230949,schadenfreud,0,99,177.5,Cartier wristwatch,3 day auction
2,1638893549,100,2.600116,chuik,0,99,177.5,Cartier wristwatch,3 day auction
3,1638893549,120,2.60081,kiwisstuff,2,99,177.5,Cartier wristwatch,3 day auction
4,1638893549,150,2.601076,kiwisstuff,2,99,177.5,Cartier wristwatch,3 day auction


In [27]:
#option 3
#passing header names
df2 = pd.read_csv('../../Datasets/auction.csv',names=['a', 'b', 'c', 'd', 'e','f','g','h','message'])
df2.head(2)

Unnamed: 0,a,b,c,d,e,f,g,h,message
0,auctionid,bid,bidtime,bidder,bidderrate,openbid,price,item,auction_type
1,1638893549,175,2.230949,schadenfreud,0,99,177.5,Cartier wristwatch,3 day auction


In [26]:
#option 4
#making a column as index column
names=['a', 'b', 'c', 'd', 'e','f','g','h','message']
df3 = pd.read_csv('../../Datasets/auction.csv',
      names=names,index_col='message')
df3.head(2)

Unnamed: 0_level_0,a,b,c,d,e,f,g,h
message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
auction_type,auctionid,bid,bidtime,bidder,bidderrate,openbid,price,item
3 day auction,1638893549,175,2.230949,schadenfreud,0,99,177.5,Cartier wristwatch


In [29]:
#option 5
#Other options
df4 = pd.read_csv('../../Datasets/auction.csv',skiprows=[0, 2, 3])


In [30]:
df4.head(2)

Unnamed: 0,1638893549,175,2.230949,schadenfreud,0,99,177.5,Cartier wristwatch,3 day auction
0,1638893549,150.0,2.601076,kiwisstuff,2.0,99.0,177.5,Cartier wristwatch,3 day auction
1,1638893549,177.5,2.909826,eli.flint@flightsafety.co,4.0,99.0,177.5,Cartier wristwatch,3 day auction


In [32]:
#other options
df5 = pd.read_csv('../../Datasets/auction.csv',na_values=['NULL'])


In [35]:
opsd_data = pd.read_csv('../../Datasets/opsd_germany_daily.txt')

In [37]:
opsd_data

Unnamed: 0,Date,Consumption,Wind,Solar,Wind+Solar
0,2006-01-01,1069.18400,,,
1,2006-01-02,1380.52100,,,
2,2006-01-03,1442.53300,,,
3,2006-01-04,1457.21700,,,
4,2006-01-05,1477.13100,,,
...,...,...,...,...,...
4378,2017-12-27,1263.94091,394.507,16.530,411.037
4379,2017-12-28,1299.86398,506.424,14.162,520.586
4380,2017-12-29,1295.08753,584.277,29.854,614.131
4381,2017-12-30,1215.44897,721.247,7.467,728.714


In [45]:
opsd_data['Wind'] = opsd_data['Wind'].fillna(0)

In [47]:
opsd_data.head(2)

Unnamed: 0,Date,Consumption,Wind,Solar,Wind+Solar
0,2006-01-01,1069.184,0.0,,
1,2006-01-02,1380.521,0.0,,


In [48]:
#using chunksize /nrows
pd.read_csv('../../Datasets/Bank_full.csv',nrows=5)


Unnamed: 0,serNo,age,job,marital,education,defaulter,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,1,58,management,married,tertiary,no,2143,yes,no,unknown,5,may,261,1,-1,0,unknown,no
1,2,44,technician,single,secondary,no,29,yes,no,unknown,5,may,151,1,-1,0,unknown,no
2,3,33,entrepreneur,married,secondary,no,2,yes,yes,unknown,5,may,76,1,-1,0,unknown,no
3,4,47,blue-collar,married,unknown,no,1506,yes,no,unknown,5,may,92,1,-1,0,unknown,no
4,5,33,unknown,single,unknown,no,1,no,no,unknown,5,may,198,1,-1,0,unknown,no


In [51]:
mydf = pd.read_csv('../../Datasets/Bank_full.csv',chunksize=1000)

In [52]:
type(mydf)

pandas.io.parsers.readers.TextFileReader

In [55]:
#for i in mydf:
#    print(i)

In [57]:
#option 1
df = pd.read_csv('https://raw.githubusercontent.com/ajaykuma/Datasets_For_Work/refs/heads/main/auction.csv')
df.head(3)

Unnamed: 0,auctionid,bid,bidtime,bidder,bidderrate,openbid,price,item,auction_type
0,1638893549,175.0,2.230949,schadenfreud,0.0,99.0,177.5,Cartier wristwatch,3 day auction
1,1638893549,100.0,2.600116,chuik,0.0,99.0,177.5,Cartier wristwatch,3 day auction
2,1638893549,120.0,2.60081,kiwisstuff,2.0,99.0,177.5,Cartier wristwatch,3 day auction


In [58]:
#writing data out
df.to_csv('../../Datasets/auction_mod.csv')

In [61]:
import sys
#df.to_csv(sys.stdout,sep = '|')
#df.to_csv(sys.stdout,sep = '|',na_rep='NULL')
#df.to_csv(sys.stdout,sep = '|',index=False, header=False)
#df.to_csv(sys.stdout,sep = '|',index=False, cols =['a','b','c'])

In [63]:
df.to_csv('../../Datasets/auction_mod2.csv',sep = '|')