---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 3.11 (Pandas-03)</h1>

## _IO with CSV Files.ipynb_

#### Read Pandas Documentation:
- General Info: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html


- For `read_csv`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv


- For `read_table`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_table.html?highlight=read_table#pandas.read_table


- For `to_csv`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv


## Learning agenda of this notebook
[Pandas](https://pandas.pydata.org/) is a popular Python library used for working in tabular data (similar to the data stored in a spreadsheet). Pandas provides helper functions to read data from various file formats like CSV, Excel spreadsheets, HTML tables, JSON, SQL, and more.
1. Reading a simple CSV File in Pandas Dataframe
2. Reading a CSV File having a Delimter other than default
3. Reading a CSV file not having column labels
4. Reading a CSV File having Comments in the beginning or end
5. Reading portion of Large CSV Files in Chunks
6. Reading a csv file from Remote Systems
7. Writing Contents of Dataframe to CSV file

In [16]:
# To install this library in Jupyter notebook
import sys
!{sys.executable} -m pip install pandas --quiet

In [1]:
import pandas as pd
pd.__version__ , pd.__path__

('1.3.4',
 ['/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas'])

## 1. Reading a  Simple CSV File in Pandas Dataframe

>-**CSVs**: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields.

In [35]:
#The `read_csv`, by default assumes that the file contains comma separated values, 
# and the first row of the file conatins names of columns, which will be taken as column labels
df = pd.read_csv('../course-datasets/big_mart_sales.csv')
df

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,FDA15,9.300,Low Fat,0.016047,Dairy,249.8092,OUT049,1999,Medium,Tier 1,Supermarket Type1,3735.1380
1,DRC01,5.920,Regular,0.019278,Soft Drinks,48.2692,OUT018,2009,Medium,Tier 3,Supermarket Type2,443.4228
2,FDN15,17.500,Low Fat,0.016760,Meat,141.6180,OUT049,1999,Medium,Tier 1,Supermarket Type1,2097.2700
3,FDX07,19.200,Regular,0.000000,Fruits and Vegetables,182.0950,OUT010,1998,,Tier 3,Grocery Store,732.3800
4,NCD19,8.930,Low Fat,0.000000,Household,53.8614,OUT013,1987,High,Tier 3,Supermarket Type1,994.7052
...,...,...,...,...,...,...,...,...,...,...,...,...
8518,FDF22,6.865,Low Fat,0.056783,Snack Foods,214.5218,OUT013,1987,High,Tier 3,Supermarket Type1,2778.3834
8519,FDS36,8.380,Regular,0.046982,Baking Goods,108.1570,OUT045,2002,,Tier 2,Supermarket Type1,549.2850
8520,NCJ29,10.600,Low Fat,0.035186,Health and Hygiene,85.1224,OUT035,2004,Small,Tier 2,Supermarket Type1,1193.1136
8521,FDN46,7.210,Regular,0.145221,Snack Foods,103.1332,OUT018,2009,Medium,Tier 3,Supermarket Type2,1845.5976


### a. Check some Attributes to get an Insight about the Data in the Dataframe
- After reading the dataset, the first thing that we check is the size of the dataset. To get the number of rows and columns present in the dataset we have the shape attribute.

In [17]:
# to check the dimension of the data set, we can use the shape attribute
df.shape

(8523, 12)

In [28]:
# display row labels  of the dataframe
df.index

RangeIndex(start=0, stop=8523, step=1)

In [29]:
# display column labels of a dataframe
df.columns

Index(['Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility',
       'Item_Type', 'Item_MRP', 'Outlet_Identifier',
       'Outlet_Establishment_Year', 'Outlet_Size', 'Outlet_Location_Type',
       'Outlet_Type', 'Item_Outlet_Sales'],
      dtype='object')

In [30]:
# display data types of each column in the dataframe
df.dtypes

Item_Identifier               object
Item_Weight                  float64
Item_Fat_Content              object
Item_Visibility              float64
Item_Type                     object
Item_MRP                     float64
Outlet_Identifier             object
Outlet_Establishment_Year      int64
Outlet_Size                   object
Outlet_Location_Type          object
Outlet_Type                   object
Item_Outlet_Sales            float64
dtype: object

In [31]:
# display a NumPy ndarray having all the values in the DataFrame, without the axes labels
df.values

array([['FDA15', 9.3, 'Low Fat', ..., 'Tier 1', 'Supermarket Type1',
        3735.138],
       ['DRC01', 5.92, 'Regular', ..., 'Tier 3', 'Supermarket Type2',
        443.4228],
       ['FDN15', 17.5, 'Low Fat', ..., 'Tier 1', 'Supermarket Type1',
        2097.27],
       ...,
       ['NCJ29', 10.6, 'Low Fat', ..., 'Tier 2', 'Supermarket Type1',
        1193.1136],
       ['FDN46', 7.21, 'Regular', ..., 'Tier 3', 'Supermarket Type2',
        1845.5976],
       ['DRG01', 14.8, 'Low Fat', ..., 'Tier 1', 'Supermarket Type1',
        765.67]], dtype=object)

In [33]:
# return number of elements in the underlying data
df.size

102276

###  b. Use `df.head()` and `df.tail()` to read first/last 'N' Rows
- The `head()` and `tail()` methods select the rows/records of a dataframe based on position, i.e., the integer value corresponding to the position of the row (from 0 to n-1).

In [19]:
# head(n) returns the first n rows for the object based on position.  Default value of n is 5
df.head()

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,FDA15,9.3,Low Fat,0.016047,Dairy,249.8092,OUT049,1999,Medium,Tier 1,Supermarket Type1,3735.138
1,DRC01,5.92,Regular,0.019278,Soft Drinks,48.2692,OUT018,2009,Medium,Tier 3,Supermarket Type2,443.4228
2,FDN15,17.5,Low Fat,0.01676,Meat,141.618,OUT049,1999,Medium,Tier 1,Supermarket Type1,2097.27
3,FDX07,19.2,Regular,0.0,Fruits and Vegetables,182.095,OUT010,1998,,Tier 3,Grocery Store,732.38
4,NCD19,8.93,Low Fat,0.0,Household,53.8614,OUT013,1987,High,Tier 3,Supermarket Type1,994.7052


In [20]:
df.head(3)

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,FDA15,9.3,Low Fat,0.016047,Dairy,249.8092,OUT049,1999,Medium,Tier 1,Supermarket Type1,3735.138
1,DRC01,5.92,Regular,0.019278,Soft Drinks,48.2692,OUT018,2009,Medium,Tier 3,Supermarket Type2,443.4228
2,FDN15,17.5,Low Fat,0.01676,Meat,141.618,OUT049,1999,Medium,Tier 1,Supermarket Type1,2097.27


In [26]:
# For negative values of n, this function returns all rows except the last `n` rows, equivalent to df[:-n].
df.head(-8520)

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,FDA15,9.3,Low Fat,0.016047,Dairy,249.8092,OUT049,1999,Medium,Tier 1,Supermarket Type1,3735.138
1,DRC01,5.92,Regular,0.019278,Soft Drinks,48.2692,OUT018,2009,Medium,Tier 3,Supermarket Type2,443.4228
2,FDN15,17.5,Low Fat,0.01676,Meat,141.618,OUT049,1999,Medium,Tier 1,Supermarket Type1,2097.27


In [22]:
# tail(n) function returns last n rows from the object based on position. Default value of n is 5
# It is useful for quickly verifying data, for example,after sorting or appending rows.
df.tail(3)

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
8520,NCJ29,10.6,Low Fat,0.035186,Health and Hygiene,85.1224,OUT035,2004,Small,Tier 2,Supermarket Type1,1193.1136
8521,FDN46,7.21,Regular,0.145221,Snack Foods,103.1332,OUT018,2009,Medium,Tier 3,Supermarket Type2,1845.5976
8522,DRG01,14.8,Low Fat,0.044878,Soft Drinks,75.467,OUT046,1997,Small,Tier 1,Supermarket Type1,765.67


In [27]:
# For negative values of `n`, this function returns all rows except the first `n` rows, equivalent to df[n:]
df.tail(-8520)

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
8520,NCJ29,10.6,Low Fat,0.035186,Health and Hygiene,85.1224,OUT035,2004,Small,Tier 2,Supermarket Type1,1193.1136
8521,FDN46,7.21,Regular,0.145221,Snack Foods,103.1332,OUT018,2009,Medium,Tier 3,Supermarket Type2,1845.5976
8522,DRG01,14.8,Low Fat,0.044878,Soft Drinks,75.467,OUT046,1997,Small,Tier 1,Supermarket Type1,765.67


## 2. Reading a CSV File having a Delimter, other than Comma
- By default, the `read_csv()` expect comma as seperator. But if the CSV file has some other seperator or delimiter like (semi-collon or tab), we need to specify that using the `delimiter` parameter of `read_csv()` method.

In [37]:
# Example: Try reading a csv file having fields separated with tab instead of commas
df = pd.read_csv('datasets/big_mart_sales_delimiter.csv')

# view the content
df.head()

Unnamed: 0,Item_Identifier\tItem_Weight\tItem_Fat_Content\tItem_Visibility\tItem_Type\tItem_MRP\tOutlet_Identifier\tOutlet_Establishment_Year\tOutlet_Size\tOutlet_Location_Type\tOutlet_Type\tItem_Outlet_Sales
0,FDA15\t9.3\tLow Fat\t0.016047301\tDairy\t249.8...
1,DRC01\t5.92\tRegular\t0.019278216\tSoft Drinks...
2,FDN15\t17.5\tLow Fat\t0.016760075\tMeat\t141.6...
3,FDX07\t19.2\tRegular\t0.0\tFruits and Vegetabl...
4,NCD19\t8.93\tLow Fat\t0.0\tHousehold\t53.8614\...


In [39]:
# Example: Try reading a csv file having fields separated with tab instead of commas
df = pd.read_csv('datasets/big_mart_sales_delimiter.csv', delimiter='\t')

# view the content
df.head()

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,FDA15,9.3,Low Fat,0.016047,Dairy,249.8092,OUT049,1999,Medium,Tier 1,Supermarket Type1,3735.138
1,DRC01,5.92,Regular,0.019278,Soft Drinks,48.2692,OUT018,2009,Medium,Tier 3,Supermarket Type2,443.4228
2,FDN15,17.5,Low Fat,0.01676,Meat,141.618,OUT049,1999,Medium,Tier 1,Supermarket Type1,2097.27
3,FDX07,19.2,Regular,0.0,Fruits and Vegetables,182.095,OUT010,1998,,Tier 3,Grocery Store,732.38
4,NCD19,8.93,Low Fat,0.0,Household,53.8614,OUT013,1987,High,Tier 3,Supermarket Type1,994.7052


## 3. Reading a CSV File not having Column Labels
- By default the `read_csv()` method assume the first row of the file will contain column labels
- If this is not the case, i.e., the file do not contain column labels rather data, it will be dealt as column label
- Understand this in following example

In [88]:
df = pd.read_csv('../course-datasets/classmarkswithoutcollabels.csv')
df.head()


Unnamed: 0,MS01,female,group B,28,72,72.1,74
0,MS02,female,group C,33.0,69.0,90,88.0
1,MS03,female,group B,21.0,,95,93.0
2,MS04,male,group A,44.0,47.0,57,44.0
3,MS05,male,group C,54.0,76.0,78,
4,MS06,female,group B,,71.0,83,78.0


**To read such files, you have to pass the parameter `header=None` to the `read_csv()` method as shown below**

In [90]:
df = pd.read_csv('../course-datasets/classmarkswithoutcollabels.csv', header=None)
df.head()


Unnamed: 0,0,1,2,3,4,5,6
0,MS01,female,group B,28.0,72.0,72,74.0
1,MS02,female,group C,33.0,69.0,90,88.0
2,MS03,female,group B,21.0,,95,93.0
3,MS04,male,group A,44.0,47.0,57,44.0
4,MS05,male,group C,54.0,76.0,78,


**Now if you want to assign new column labels to make them more understandable, you can use the set_index() method as shown below**

In [92]:
col_names = ['rollno', 'gender', 'group', 'age', 'math', 'english', 'urdu']
df.columns = col_names
df.head()

Unnamed: 0,rollno,gender,group,age,math,english,urdu
0,MS01,female,group B,28.0,72.0,72,74.0
1,MS02,female,group C,33.0,69.0,90,88.0
2,MS03,female,group B,21.0,,95,93.0
3,MS04,male,group A,44.0,47.0,57,44.0
4,MS05,male,group C,54.0,76.0,78,


## 4. Reading a CSV File having Comments in the beginning
- You may get an error while reading a CSV file because someone may have added few comments on the top of the file. In pandas we can still read the data set by skipping few rows from the top.
- To deal with the ParseError, open the csv file in the text editor and check if you have some comments on the top.
- If yes, then count the number of rows to skip.
- While reading file, pass the parameter **skiprows = n** (number of rows in the beginninghaving comments to skip)
- While reading file, pass the parameter **skipfooter = n** (number of rows at the end having comments to skip)

In [40]:
# Example: Try reading a csv file having 5 comments lines in the beginning.
df = pd.read_csv('datasets/big_mart_sales_top_row_error.csv')

# view the data
df.head()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2


In [41]:
# Example: Try reading a csv file having 5 comments lines in the beginning.

df = pd.read_csv('datasets/big_mart_sales_top_row_error.csv', skiprows= 5)

# view the data
df.head()

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,FDA15,9.3,Low Fat,0.016047,Dairy,249.8092,OUT049,1999,Medium,Tier 1,Supermarket Type1,3735.138
1,DRC01,5.92,Regular,0.019278,Soft Drinks,48.2692,OUT018,2009,Medium,Tier 3,Supermarket Type2,443.4228
2,FDN15,17.5,Low Fat,0.01676,Meat,141.618,OUT049,1999,Medium,Tier 1,Supermarket Type1,2097.27
3,FDX07,19.2,Regular,0.0,Fruits and Vegetables,182.095,OUT010,1998,,Tier 3,Grocery Store,732.38
4,NCD19,8.93,Low Fat,0.0,Household,53.8614,OUT013,1987,High,Tier 3,Supermarket Type1,994.7052


## 5. Reading a portion of CSV File in a Dataframe
- Suppose the dataset inside the csv file is too big and you don't want to spend that much time for reading that data
- Or might be your system crashes, when you try to load that much data
- solution is read
    - Specific number of rows by passing `nrows` parameter to `read_csv()` method
    - Specific number of columns by passing `usecols` parameter to `read_csv()` method


In [43]:
# Example: Read just 100 rows from the csv file
# In the read_csv() function, use the parameter **nrows=100** for this purpose

df = pd.read_csv('datasets/big_mart_sales.csv',nrows=100)

# check the shape of the data
df.shape

(100, 12)

In [44]:
# Example: Read specific columns from the csv file
# In read_csv() function, use the parameter usecols for this purpose
df = pd.read_csv('datasets/big_mart_sales.csv', usecols= ['Item_Identifier', 
                                                                             'Item_Type',
                                                                             'Item_MRP',
                                                                             'Item_Outlet_Sales'])


# check the shape of the data
df.shape
# view the data
#df.head()

(8523, 4)

In [45]:
# Ofcourse you can use both the parameters at the same time
fd = pd.read_csv('datasets/big_mart_sales.csv', nrows = 25, usecols= ['Item_Identifier', 
                                                                             'Item_Type',
                                                                             'Item_MRP',
                                                                             'Item_Outlet_Sales'])


# check the shape of the data
fd.shape
# view the data
#fd.head()

(25, 4)

**Try reading a file in which the column names are not mentioned, rather the first row contains the data**

## 6. Reading a csv file from a Remote System

In [1]:
# To avoid URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED]..... 
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

**Example 1: Reading a csv file from Web**

In [2]:
import pandas as pd
df = pd.read_csv('http://bit.ly/chiporders', sep='\t')
df.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


**Example 2: Reading a csv file from a Public GitHub Gist**

In [3]:
myurl = "https://gist.githubusercontent.com/arifpucit/bbcb0bba0b5c245585b375f273f17876/raw/28ddebe991c86f7001178896329005ea174f2bde/data1.csv"


In [4]:
df = pd.read_csv(myurl)
df

Unnamed: 0,Names,Age,Addr
0,Arif,50,Lahore
1,Rauf,52,Islamabad
2,Maaz,27,Peshawer
3,Hadeed,22,Islamabad
4,Mujahid,18,Karachi


In [61]:
myurl = 'https://gist.githubusercontent.com/arifpucit/3902862f8756c0884bdbcfa8f39f06cd/raw/38bfc682efc3ff973373e1e22aeb5666849db328/italy-covid-daywise.csv'


In [62]:
df = pd.read_csv(myurl)
df

Unnamed: 0,date,new_cases,new_deaths,new_tests
0,2019-12-31,0.0,0.0,
1,2020-01-01,0.0,0.0,
2,2020-01-02,0.0,0.0,
3,2020-01-03,0.0,0.0,
4,2020-01-04,0.0,0.0,
...,...,...,...,...
243,2020-08-30,1444.0,1.0,53541.0
244,2020-08-31,1365.0,4.0,42583.0
245,2020-09-01,996.0,6.0,54395.0
246,2020-09-02,975.0,8.0,


**Example 3: Reading a csv file from a Google Docs**
- Google Sheet URL:  https://docs.google.com/spreadsheets/d/1H9ZTGVRXN3zuyP3cbJQnbX7JlqhqFNMOE_WwDP-PRTE/edit#gid=2084742287


In [81]:
sheetID = '1H9ZTGVRXN3zuyP3cbJQnbX7JlqhqFNMOE_WwDP-PRTE'
sheetName = 'sheet1'
URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(sheetID, sheetName)

df = pd.read_csv(URL)
df.head()

Unnamed: 0,rollno,gender,group,age,math,english,urdu
0,MS01,female,group B,28.0,72.0,72,74.0
1,MS02,female,group C,33.0,69.0,90,88.0
2,MS03,female,group B,21.0,,95,93.0
3,MS04,male,group A,44.0,47.0,57,44.0
4,MS05,male,group C,54.0,76.0,78,


## 7. Writing Contents of Dataframe to a CSV File
- Let us write/save the above dataframe inside a file named classmarks.csv in the course-datasets directory using the `pd.to_csv()` method

### a. Write Dataframe with Index to csv file and confirm

In [84]:
df.to_csv('../course-datasets/classmarkswithindex.csv')
df1 = pd.read_csv('../course-datasets/classmarkswithindex.csv')
df1.head()

Unnamed: 0.1,Unnamed: 0,rollno,gender,group,age,math,english,urdu
0,0,MS01,female,group B,28.0,72.0,72,74.0
1,1,MS02,female,group C,33.0,69.0,90,88.0
2,2,MS03,female,group B,21.0,,95,93.0
3,3,MS04,male,group A,44.0,47.0,57,44.0
4,4,MS05,male,group C,54.0,76.0,78,


### b. Write Dataframe without Index to csv file and confirm

In [85]:
df.to_csv('../course-datasets/classmarks.csv', index=False)
df2 = pd.read_csv('../course-datasets/classmarks.csv')
df2.head()


Unnamed: 0,rollno,gender,group,age,math,english,urdu
0,MS01,female,group B,28.0,72.0,72,74.0
1,MS02,female,group C,33.0,69.0,90,88.0
2,MS03,female,group B,21.0,,95,93.0
3,MS04,male,group A,44.0,47.0,57,44.0
4,MS05,male,group C,54.0,76.0,78,


### c. Write Dataframe to csv file with separator other than Default

In [86]:
df.to_csv('../course-datasets/classmarkswithtab.csv', index=False, sep='\t')
df3 = pd.read_csv('../course-datasets/classmarkswithtab.csv')
df3

Unnamed: 0,rollno\tgender\tgroup\tage\tmath\tenglish\turdu
0,MS01\tfemale\tgroup B\t28.0\t72.0\t72\t74.0
1,MS02\tfemale\tgroup C\t33.0\t69.0\t90\t88.0
2,MS03\tfemale\tgroup B\t21.0\t\t95\t93.0
3,MS04\tmale\tgroup A\t44.0\t47.0\t57\t44.0
4,MS05\tmale\tgroup C\t54.0\t76.0\t78\t
5,MS06\tfemale\tgroup B\t\t71.0\t83\t78.0
6,MS07\tfemale\tgroup B\t47.0\t88.0\t95\t92.0
7,MS08\tmale\tgroup B\t33.0\t40.0\t43\t39.0
8,MS09\tmale\tgroup D\t27.0\t64.0\t64\t67.0
9,MS10\tfemale\tgroup B\t33.0\t38.0\t60\t50.0


## Check your Concepts

Try answering the following questions to test your understanding of the topics covered in this notebook:

1. What is the purpose of the `os` module in Python?
2. How do you identify the current working directory in a Jupyter notebook?
3. How do you retrieve the list of files within a directory using Python?
4. How do you create a directory using Python?
5. How do you check whether a file or directory exists on the filesystem? Hint: `os.path.exists`.
6. Where can you find the full list of functions contained in the `os` module?
7. Give examples of 5 useful functions from the `os` and `os.path` modules.
8. How do you download a file from a URL using Python?
9. How do you open a file using Python? Give an example?
10. What are the different modes for opening a file in Python?
11. Can you open a file in multiple modes? Illustrate with an example.
12. What is the file object? How is it useful?
13. How do you read the contents of a file into a string?
14. What is a CSV file? Give an example.
15. How do you close an open file?
16. Why is it essential to close a file after processing it?
17. How do you ensure that files are closed automatically after processing? Give an example.
18. How is the `with` statement useful for working with files?
19. What happens if you try to read from a closed file?
20. How do you read the contents of a file line by line?
21. Write a function to convert the contents of a CSV file into a list of dictionaries (one dictionary for each row of the file).
22. Write a function to convert the contents of a CSV file into a dictionary of lists (one dictionary for each column of the file).
23. How do you write to a file using Python?
24. How is the string `.format` method for writing data to a file in CSV format?
25. Write a function to write data from a list of dictionaries into a CSV file.
26. Write a function to write data from a dictionary of lists into a CSV file.
27. Where can you learn about the methods supported by the file object in Python?
28. How can you read from and write to CSV files using Pandas?
