#### Clarusway Python

* [Instructor Landing Page](landing_page.ipynb)
* <a href="https://colab.research.google.com/github/4dsolutions/clarusway_data_analysis/blob/main/basic_python/22.Python_Session22.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>
* [![nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/4dsolutions/clarusway_data_analysis/blob/main/basic_python/22.Python_Session22.ipynb)

<a id="toc"></a>

## <p style="background-color:#0D8D99; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Python Session 22 - Working with Files</p>

#### <div class="alert alert-block alert-info"><h1><p style="text-align: center; color:purple"> Open a File<br><br>Reading Files<br><br>Writing Files<br><br>Working with CSV Files</p> 

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Open a file</p>

## The methods to open a file in Python:

**open() Function:** The most basic method to open a file is to use the "open()" function. This function takes the file name and operation type parameters and opens the file.

**with open() Statement:** The "with open()" statement is used to open and automatically close a file. This statement uses the "open()" function to open the file and automatically closes it at the end of the code block.

**pathlib Library:** Starting from Python 3.4, the pathlib library provides a more modern way of file handling. This library is used to manage file paths and open files.

**os Library:** The "os" library provides more advanced functions for file handling. This library is used to manage file names and paths, change file permissions, and delete files.

**io Library:** The "io" library provides more advanced tools for working with files. This library is used to manage file input and output, keep file content in memory, and make file handling more efficient.

<hr>

**We will use the 'with open' command when opening files in this notebook to avoid wasting time with open and close operations. However, let's also see a simple example of how to open a file with libraries in Python**
<hr>

### pathlib library:

In [None]:
from pathlib import Path

# file path:
file_path = Path("fishes.txt")

# open the file:
with file_path.open(mode="r") as f:
    file_content = f.read()

print(file_content)

### os library:

In [None]:
import os

# file path:
file_path = "fishes.txt"

# open the file:
file_descriptor = os.open(file_path, os.O_RDONLY)

# read the content of the file:
content = os.read(file_descriptor, os.stat(file_path).st_size)

# close the file:
os.close(file_descriptor)

# print the content:
print(content)

### io library:

In [None]:
import io

file_contents = "This is a sample file created using 'io' library."
file = io.StringIO(file_contents)

# Read the content of the file
content = file.read()

# Print the content of the file
print(content)

# Close the file
file.close()

<hr>

A data analyst frequently needs to perform file handling operations, and therefore they may often use the **"open()"** and **"with open()"** methods. 

**As the "with open()" statement is a safer and more organized way to open and close files, data analysts usually prefer to use this method.**

In addition, data analysts frequently use the **"os" library** to manage file paths, change file permissions, delete files, and perform similar operations. 

They may also often use the **"io" library** to read, write, and keep file content in memory.
<hr>

## Differences between "open" and "with open":

**File Closing:** When using "open", we need to manually close the file when we are done with it. However, when using "with open", the file is automatically closed, and there is no need for manual file closing.

**Error Handling:** When using "open", we need to handle errors manually. However, when using "with open", if an error occurs, the file is automatically closed, and an error message is displayed.

**Readability:** Using "with open" can make the code more readable. The entire code block required for file processing is located under "with open", making the code appear more organized.

**Performance:** Using "with open" shows better performance in file processing. Since the file is automatically closed, there is no unnecessary memory usage, and it uses fewer system resources.

**Safer:** Using "with open" is safer because it automatically closes the file during the processing. There is no chance of forgetting to close the file, which could cause errors in the file.

## open() build in function:

[python official document-build in functions](https://docs.python.org/3/library/functions.html)

**After accessing the above link, click on the open() function**

## Exception Handling in Files:

During file operations in Python, many errors can occur, such as file not found errors, permission errors, and errors related to keeping the file open.

If an exception occurs when we are performing some operation with the file, the code exits without closing the file. **This can cause the program to crash unexpectedly or produce incorrect results.** 


**Therefore, we use exception handling to handle error conditions during file operations, and close the file in finally block**

The finally block is always executed after the try/except blocks are completed and is used to release resources like files.

Examples:

In [None]:
try:
    file = open("fishes.txt", "r")
    read_content = file.read()
    print(read_content)

finally:
    # close the file under all circumstances:
    file.close()

In [None]:
try:
    file = open("fishes2.txt", "w")  # the fishes2.txt files is a read-only file.
    file.write("This text is being attempted to be written to the file.")
except FileNotFoundError:
    print("File not found.")
except PermissionError:
     print("You do not have permission to access this file.")
except Exception as e:
    print("An unexpected error occurred: ", e)
finally:
    file.close()

In this example, a PermissionError was raised because we tried to write a line to a read-only file. Since the finally block always executes, we were able to close our file.

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Reading Files</p>

[Python official document - open() function](https://docs.python.org/3/library/functions.html#open)

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file)

# read() method

**Read at most n characters from stream.**

**Read from underlying buffer until we have n characters or we hit EOF.**

**If n is negative or omitted, read until EOF.**

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file)
    print()
    print(file.read())

In [None]:
file = open("fishes.txt", "r", encoding="utf-8")

print(file.read())

file.close()

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    content = file.read()

In [None]:
content

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file.read(33))

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file.read(33))
    content_33 = file.read(33)

In [None]:
content_33

In [None]:
len(content_33)

In [None]:
print(content)

In [None]:
print(content[:33])

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file.read(33))
    print(file.read(33))

# seek() method 

**Change stream position.**

**Change the stream position to the given byte offset. The offset is
interpreted relative to the position indicated by whence.**

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file.read(33))
    print(file.read(33))
    
    file.seek(0)
    print(file.read(33))

# tell() method

**Return current stream position**

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file.read(33))
    print(file.read(33))
  
    file.seek(0)
    
    print(file.read(33))
    print(file.tell())

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file.read(33))
    print(file.read(15))
    print(file.tell())
    
    file.seek(0)
    print(file.read(33))
    print(file.tell())

In [None]:
with open("rumi.txt", "r", encoding="utf-8") as rumi :
    print(rumi.read())

In [None]:
with open("rumi.txt", "r", encoding="utf-8") as rumi :
    print(rumi.read(33))
    print(rumi.read(15))
    print(rumi.tell())
    
    rumi.seek(0)
    
    print(rumi.read(33))
    print(rumi.tell())   

# readline() method

**Read until newline or EOF.**

**Returns an empty string if EOF is hit immediately.**

In [None]:
"\n"  # detects "\n" character

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file.readline())
    print(file.readline())
    print(file.readline())

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file.readline(13))
    print(file.readline(13))
    print(file.readline(13))

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    print(file.readline(13))
    print(file.readline(13))
    print(file.readline(13))
    print(file.readline(13))

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    part_1 = file.readline(13)
    part_2 = file.readline(13)
    part_3 = file.readline(13)
    part_4 = file.readline(13)

In [None]:
part_1

In [None]:
part_2

In [None]:
part_3

In [None]:
part_4

In [None]:
print(part_3)

In [None]:
first_line = part_1 + part_2 + part_3
first_line

In [None]:
with open("rumi.txt", "r", encoding="utf-8") as rumi :
    print(rumi.read())

In [None]:
with open("rumi.txt", "r", encoding="utf-8") as rumi :
    
    print(rumi.readline())
    print(rumi.readline())
    print(rumi.readline(50))

# Getting and using the file names in a directory with "os" module

There is a Linux command we use to list the files in our working directory: 'ls'. 

**"ls command" lists the files in the current directory as a string, which can be used anywhere.** 

In [None]:
# ls

In [None]:
import os

**The 'os' module has a listdir() method that does the same job as the 'ls' command in Linux.**

In [None]:
os.listdir()

**It gave us a list of all the files in the current directory as strings.**

**we will assign this to a variable (files) to take the file names:**

In [None]:
files = os.listdir()
files

Now I have a variable called 'files', and I can keep the names of all the files in my working directory as strings in a list.

This is a commonly used method. I can now do whatever I want with this data that I retrieved from my local. For example, we can use this list in a loop. Suppose there are tens of thousands of files in a directory. You can copy and archive these files somewhere.

<hr>

**You can also perform file archiving operations in Jupyter Notebook. There are some modules for this operation. Let's do an example with one of them.**

## Archiving a folder with the "shutil" module.

In [None]:
import shutil

In [None]:
help(shutil.make_archive)

``shutil.make_archive(
    base_name,
    format,
    root_dir=None,
    base_dir=None,
    verbose=0,
    dry_run=0,
    owner=None,
    group=None,
    logger=None,
)
Docstring:
Create an archive file (eg. zip or tar).``

In [None]:
# archive the files in the Files directory:

shutil.make_archive("files_zip", "zip", "Files")

# files_zip : The name of the archive file that will be created at the end of the process
# zip : the format of archive
# Files : The directory to be archived.

## Extract .zip file with "zipfile" module :

In [None]:
from zipfile import ZipFile

with ZipFile("files_zip.zip", "r") as files_zip:
    files_zip.extractall()

In [None]:
# os.listdir()

# see the files_zip.zip file 

# readlines() method

**Return a list of lines from the stream.**

**hint can be specified to control the number of lines read: no more
lines will be read if the total size (in bytes/characters) of all
lines so far exceeds hint.**

In [None]:
with open("fishes.txt", "r", encoding = "utf-8") as file :
    fish = file.readlines()

In [None]:
fish

In [None]:
type(fish)

In [None]:
for line in fish:
    print(line)

In [None]:
with open("rumi.txt", "r") as file :
    rumi = file.readlines()  # return a list from the stream

In [None]:
rumi

In [None]:
with open("fishes.txt", "r") as file :
    for line in file :
        print(line)

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Writing Files</p>

# write() method

**Write string to stream.**

**Returns the number of characters written (which is always equal to
the length of the string)**

In [None]:
with open("dummy_file.txt", "w", encoding ="utf-8") as file :
    file.write("This is the first line of my text file")
    
with open("dummy_file.txt", "r", encoding="utf-8") as file:
    print(file.read())

In [None]:
with open("dummy_file.txt", "w", encoding ="utf-8") as file :
    file.write("This is the new line of my dummy_file.txt file")
    
with open("dummy_file.txt", "r", encoding="utf-8") as file:
    print(file.read())

**Task : Use the same list of fruits names,** 

**Modify the list for use,
Overwrite them to the same fruits.txt file each on separate lines one after another.**

In [None]:
fruits = ['Banana', 'Orange', 'Apple', 'Strawberry', 'Cherry']

In [None]:
with open("fruits.txt", "w", encoding="utf-8") as dosyam :
    for meyve in fruits :
        dosyam.write(meyve + "\n")
        
with open("fruits.txt", "r") as file:
    print(file.read())

# writelines() method

**Built-in mutable sequence.**

**If no argument is given, the constructor creates a new empty list.**

**The argument must be an iterable if specified.**

In [None]:
with open("fruits.txt") as file:
    fruits_1 = file.readlines()
    
fruits_1

In [None]:
with open("fruits9.txt", "w", encoding="utf-8") as ff :
    ff.writelines(fruits_1)

In [None]:
with open("fruits9.txt", "r", encoding="utf-8") as ff :
    print(ff.read())

In [None]:
with open("fruits9.txt", "a", encoding="utf-8") as ff :
    ff.writelines(fruits_1)

In [None]:
with open("fruits9.txt", "r", encoding="utf-8") as ff :
    print(ff.read())

In [None]:
fruits_1

## adding a list to a list by using "+" operator

In [None]:
fruits_2 = fruits_1 + ["Lemon"]
fruits_2

In [None]:
a =  [1,2,3]
b = a + [4]
b

**Task : Let's add 'melon' to our existing fruits.txt file as the last line,**

**Read and display the entire file content,**

**Read and display the entire file content line by line in a list form.**

In [None]:
with open("fruits_9", "w", encoding="utf-8") as ff :
    ff.writelines(fruits_2)
    
with open("fruits_9", "r", encoding="utf-8") as ff :
    print(ff.read())

In [None]:
with open("fruits_9", "a", encoding="utf-8") as ff :
    ff.writelines(fruits_2)

with open("fruits_9", "r", encoding="utf-8") as ff :
    print(ff.read())

## Task : let's take one of the longer poems and do some exercises :

For example, **William Wordsworth's 'Daffodils'** poem.

This poem is one of the most famous examples of English Romanticism and is concerned with nature, human emotions, and imagination.

Let's download the Daffodils.txt file and paste it into the working directory.

### 1. Read and print out the poem.

In [None]:
with open("Daffodils.txt", "r", encoding = "utf-8") as file :
    print(file.read())

### 2. Add a new line every 6 lines, and write the new version of the poem back to the same file.

The final output will look like this:

<hr>

I will add a new line (enter) every 6 lines.<br>
So I should do this operation at multiples of 6.<br>
But how can we determine if a number is a multiple of 6?<br>
Yes, with modulus 6.<br>
If a number is divisible by 6 without remainder, it is a multiple of 6.<br>
So, if the modulus of a number divided by 6 returns 0, then that number is a multiple of 6.

<hr>

So, we need to use a counter.<br>
And we need to read the text line by line to traverse the lines.<br>
For this, we should use the readlines() method.

In [None]:
# First use the readlines() and see the output:

with open("Daffodils.txt", "r", encoding = "utf-8") as file :
    lines = file.readlines()
    
lines

Here is a list consisting of the lines of a poem as its elements.

Now we can traverse and process this list using a for loop.

In [None]:
counter = 0

with open("Daffodils.txt", "w", encoding="utf-8") as file:
    
    for i in lines:
        counter +=1
        if counter % 6 == 0 :
            file.write(i + "\n")
            
        else:
            file.write(i)

In [None]:
with open("Daffodils.txt", "r", encoding = "utf-8") as file :
    print(file.read())

<hr>

See, we have divided the poem into stanzas and added a new line every 6 lines using Python. 

We edited a file.
<hr>

### 3. Let's reverse this process now. We will remove the new lines and restore the poem to its original form. 

**However, we will append the original version to the end of the file without overwriting it.**

**The file will contain the stanza-separated version first, and then the version without new lines.**

In [None]:
with open("Daffodils.txt", "r", encoding = "utf-8") as file :
    lines_new = file.readlines()

In [None]:
lines_new

<hr>

**You can solve this by iterating over the lines and implementing a condition to remove the new lines :**
<hr>

Since the final version will be preserved and the new version will be added below it, we should use the "a" mode.

iterate over the lines in the lines_new list using a loop.

write the lines that meet the condition to the file with "a" mode, 

and skip the ones that do not meet the condition.

look at the elements of the list, which are the lines of the poem. If a line consists of only a "newline" character, not write it. If it is a line belonging to the poem and not just a newline character, then write it. This way, we will remove the newlines:

In [None]:
with open("Daffodils.txt", "a", encoding="utf-8") as f:
    
    for i in lines_new:
        if i == "\n" :
            pass
        else :
            f.write(i)
            
with open("Daffodils.txt", "r", encoding = "utf-8") as file :
    print(file.read())

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Reading&Writing CSV Files</p>

**Python official document - CSV File Reading and Writing:   https://docs.python.org/3/library/csv.html**

**Link2 - Reading CSV files in python: https://www.geeksforgeeks.org/reading-csv-files-in-python/**

**Link2 - Writing CSV files in python: https://www.geeksforgeeks.org/writing-csv-files-in-python/**

**Link3 - IO tools (text, CSV, HDF5, …): https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html**

<a id="toc"></a>

## <p style="background-color:#F39C12; font-family:newtimeroman; color:#FFF9ED; font-size:125%; text-align:center; border-radius:10px 10px;">1. Ordinary Reading Method of CSV Files</p>

In [None]:
with open("titanic.csv", "r", encoding="utf-8") as file:
    print(file.read(500))

**Create people.csv file in your working directory as below:**

```
row num,first name,last name,ages
1,isabella,bold,22
2,lara,bold,12
3,solomon,bold,46
4,adam,smithson,40
5,mose,smithson,51
```

In [None]:
with open("people.csv", "r", encoding="utf-8") as file :
    print(file.read())

In [None]:
with open("titanic.csv") as file :
    print(file.read(500))

<a id="toc"></a>

## <p style="background-color:#F39C12; font-family:newtimeroman; color:#FFF9ED; font-size:125%; text-align:center; border-radius:10px 10px;">2. Reading the CSV Files with csv Module</p>

In [None]:
import csv

In [None]:
with open("people.csv", "r", newline="", encoding="utf-8") as file:
    csv_rows = csv.reader(file)  # reader() function takes each
                                # row (lines) into a list
        
    for row in csv_rows:
        print(row)

In [None]:
with open("people.csv", "r", newline="", encoding="utf-8") as file:
    csv_rows = csv.reader(file, delimiter=":")  
# we specified a char ":" that is not used in the csv file as a value of delimiter 
                                               
    for row in csv_rows:
        print(row)
        
# Note that! : we get each row as a single string in a list!

In [None]:
with open("titanic.csv", "r", newline="", encoding="utf-8") as file:
    csv_rows = csv.reader(file)  # reader() function takes each
                                # row (lines) into a list
        
    for idx, row in enumerate(csv_rows):
        print(row)
        if idx > 20:
            break

<a id="toc"></a>

## <p style="background-color:#F39C12; font-family:newtimeroman; color:#FFF9ED; font-size:125%; text-align:center; border-radius:10px 10px;">3. Reading & Writing the CSV Files with Pandas</p>

In [None]:
# !pip install pandas

In [None]:
import pandas as pd

## read_csv() method

**Read a comma-separated values (csv) file into DataFrame.**

In [None]:
df = pd.read_csv("titanic.csv")
df

In [None]:
df.Sex.value_counts(dropna = False)

# I just took the Sex column from the file and got the numbers of how many genders there are.

In [None]:
# I create "ladies" variable from the rows of "Sex" column contain the string "female"

ladies = df[df["Sex"] == "female"]
ladies

## to_csv() method

**Write object to a comma-separated values (csv) file.**

In [None]:
ladies.to_csv("titanic_ladies.csv", index = False)

# Created a new csv file named "titanic ladies.csv

In [None]:
# Let's read our new csv file by using read_csv() method:

pd.read_csv("titanic_ladies.csv")

In [None]:
dead = df[df["Survived"] == 0]
dead

In [None]:
dead.to_csv("titanic_dead.csv", index = False)

In [None]:
pd.read_csv("titanic_dead.csv")

## Let's do a few small coding examples in Pandas :

In [None]:
df.groupby("Sex")["Survived"].mean()

In [None]:
df[df["Sex"] == "female"] 

In [None]:
df[df["Sex"] == "female"]["Survived"].value_counts()

In [None]:
df.groupby("Pclass")["Survived"].mean()

In [None]:
df.hist("Pclass", "Survived")

In [None]:
df.hist("Sex", "Survived")

In the left chart, we observe that the number of men who died (the Survived column value of 0) is much higher than that of women. In the right chart, we see the graph of survivors (the Survived column value of 1). There were many more women who survived than men. By plotting simple charts, we obtained an insight.