## Interacting with Os and the File system 

The os module in python provides many functions for interacting with the os and filesystem 

In [1]:
import os 

In [2]:
#we can chcekc the present working directory using the os.getcwd function.
#example
os.getcwd()

'c:\\Users\\PC\\Desktop\\Data science\\numpy training'

In [3]:
# to get the list files in the directory, we use the function os.listdir. You pass an absolute or relative path of a directory as the argument to th function.
help(os.listdir)

Help on built-in function listdir in module nt:

listdir(path=None)
    Return a list containing the names of the files in the directory.

    path can be specified as either str, bytes, or a path-like object.  If path is bytes,
      the filenames returned will also be bytes; in all other circumstances
      the filenames returned will be str.
    If path is None, uses the path='.'.
    On some platforms, path may also be specified as an open file descriptor;\
      the file descriptor must refer to a directory.
      If this functionality is unavailable, using it raises NotImplementedError.

    The list is in arbitrary order.  It does not include the special
    entries '.' and '..' even if they are present in the directory.



In [4]:
#f no path is provided, listdir defaults to the current directory.
os.listdir()

['Amazing numpy functions.ipynb',
 'basics of numpy-checkpoint.ipynb',
 'basics of numpy.ipynb',
 'climate_results.txt',
 'data',
 'Numerical Analysis with Numpy.ipynb',
 'reading and wrting files in python.ipynb']

In [5]:
# listing of files in the parent directory
os.listdir("..")

['.git',
 '.ipynb_checkpoints',
 'basic python',
 'linear regression.ipynb',
 'numpy training']

In [6]:
#List the contents of a specific directory by providing its path as a string
content= os.listdir("c:\\Users\\PC\\Desktop\\Home\\data science Training\\numpy training")
for entry in content:
    print(entry)

basics of numpy.ipynb
reading and wrting files in python.ipynb


In [7]:
#list the content of a specific directory by providing its path as a byte
content=os.listdir(b"c:\\Users\\PC\\Desktop\\Home\\data science Training\\numpy training")
for entry in content:
    print(entry)

b'basics of numpy.ipynb'
b'reading and wrting files in python.ipynb'


In [8]:
#Handle potential exceptions such as FileNotFoundError and PermissionError.
import os

path = '/non/existent/path'  # Example path that might not exist
try:
    for entry in os.listdir(path):
        print(entry)
except FileNotFoundError:
    print("Directory not found.")
except PermissionError:
    print("Permission denied.")

Directory not found.


In [9]:
#A new directory can be created using the os.makedirs 
#let created a new directory called data, where we'll later download some files.
os.makedirs("./data",exist_ok=True)
#exist_ok=True: this is a parameter that tells the function how to handle the scenario where the directory already exists.
#If exist_ok is set to True, the function will not raise an error if the directory already exists; it will simply do nothing.

In [10]:
#confirming that the directory has been indeed created
'data' in os.listdir('.')

True

In [11]:
#let at us download some files into the data directory using the urlib module
url1='https://gist.githubusercontent.com/aakashns/8de7b03f241b787042be1a1e4afd91da/raw/a15ba86d0260ca2d2615afa9d809bf4be019ab3d/loans1.txt'
url2='https://gist.githubusercontent.com/aakashns/8de7b03f241b787042be1a1e4afd91da/raw/a15ba86d0260ca2d2615afa9d809bf4be019ab3d/loans2.txt'
url3='https://gist.githubusercontent.com/aakashns/8de7b03f241b787042be1a1e4afd91da/raw/a15ba86d0260ca2d2615afa9d809bf4be019ab3d/loans3.txt'

In [12]:
import urllib.request

In [13]:
urllib.request.urlretrieve(url1,'./data/loans1.txt')
urllib.request.urlretrieve(url2,'./data/loans2.txt')
urllib.request.urlretrieve(url3,'./data/loans3.txt')

('./data/loans3.txt', <http.client.HTTPMessage at 0x20795009df0>)

In [14]:
#let's confirm that the files were downloaded
os.listdir('./data')

['loans1.txt', 'loans2.txt', 'loans3.txt']

## Reading from a file 
To read the contents of a file, we need to open the file using the built-in open function. The open function returns a file object, provides several methods for interacting with the contents of the file. It also accepts a mode argument. 
The mode function specifies our intended interaction with the file. That is to say:

'r'-open for reading (default)

'w'-open for writing, truncating the file first

'x'-creating a new file and open it for writing 

'a'-open for writing, appending to the end of the file if it exists

'b'-binary mode

't'-text mode (default)

'+'open a disk file for updating (reading and writing)

'U'-universal newline mode (deprecated)

In [15]:
file1=open('./data/loans1.txt', mode='r')

In [16]:
#To view the contents of the file we can use use read method of the file object
file1_contents=file1.read()
print(file1_contents)

amount,duration,rate,down_payment
10000,36,0.08,20000
200000,12,0.1,
628400,120,0.12,100000
4637400,240,0.06,
42900,90,0.07,8900
916000,16,0.13,
45230,48,0.08,4300
991360,99,0.08,
423000,27,0.09,47200


In [17]:
# it is a string file as you can see
type(file1_contents)

str

In [18]:
#it is important to close files because they are store in Ram this might take a lot of space hence heating the machine slowing its processor speed
file1.close()

## Closing files automatically using with

To make it easy to automatically close a file once you are done processing it, you can open it using the with statement.

In [19]:
with open('./data/loans2.txt', 'r') as file2:
    file2_contents=file2.read()
    print(file2_contents)
#once the statement within the the with block are executed the .close method of file2 is automatically invoked. 

amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300



In [20]:
#to confirm this, let's try reading the content of the file once again
try:
    print(file2.read())
except ValueError:
    print("The file is closed")


The file is closed


## Reading a file line by line
File objects provide a readlines method to read a file line--by-line

In [21]:
with open('./data/loans3.txt') as file3:
    file3_lines=file3.readlines()
    print(file3_lines)


['amount,duration,rate,down_payment\n', '883000,16,0.14,\n', '45230,48,0.07,4300\n', '100000,12,0.1,\n', '728400,120,0.12,100000\n', '3637400,240,0.06,\n', '82900,90,0.07,8900\n', '316000,16,0.13,\n', '15230,48,0.08,4300\n', '991360,99,0.08,\n', '323000,27,0.09,47200\n', '10000,36,0.08,20000\n', '528400,120,0.11,100000\n', '8633400,240,0.06,\n', '12900,90,0.08,8900\n']


## Processing Data from files

It is important to convert the content of files from large string data files before processing. For instance, the file, loans1.txt containing information about loans in a csv format, we can perform the following:
1. Read the file line by line 

2. Parse the first line to get the list of the column names or headers 

3.Split each remaining line and convert each into a float

4.Create a dictionary for each loan using headers as keys 

5.Create a list of dictionaries to keep track of all loans 

Since all the other files will require the same operations, it would be useful to define a function read_csv to do this. We'll also define some helper functions to build up the functionality step by step.

In [22]:
def parse_headers(header_line):
    return header_line.strip().split(',')

In [23]:
headers=parse_headers(file3_lines[0])
headers

['amount', 'duration', 'rate', 'down_payment']

In [24]:
def parse_values(data_line):
    values=[]
    for item in data_line.strip().split(','):
        if item=='':
            #how to handle missing values in a file
            values.append(0.0)
        else:
            values.append(float(item))
    return values

In [25]:
parse_values(file3_lines[2])

[45230.0, 48.0, 0.07, 4300.0]

let's define a function create_item_dict which takes a slist of values and a list of headers as inputs and returns a dictionary with the values associated with their respective headers as keys

In [26]:
def create_item_dict(values,headers):
    result={}
    for value,header in zip(values,headers):
        result[header]=value
    return result


In [27]:
values1=parse_values(file3_lines[2])

In [28]:
create_item_dict(values1,headers)

{'amount': 45230.0, 'duration': 48.0, 'rate': 0.07, 'down_payment': 4300.0}

We can put it all together and define the read_csv function

In [29]:
def read_csv(path):
    result=[]
    #open the file in a read mode
    with open(path, 'r') as f:
        #get a list of lines
        lines=f.readlines()
        #parse the header
        headers=parse_headers(lines[0])
        #loop over the remaining lines
        for data_line in lines[1:]:
            #parse the values
            values=parse_values(data_line)
            #creat a dictionary using values # headers
            item_dict=create_item_dict(values,headers)
            #add the dictionary to the result
            result.append(item_dict)
    return result


In [68]:
loans1=read_csv('./data/loans1.txt')
loans1

[{'amount': 10000.0, 'duration': 36.0, 'rate': 0.08, 'down_payment': 20000.0},
 {'amount': 200000.0, 'duration': 12.0, 'rate': 0.1, 'down_payment': 0.0},
 {'amount': 628400.0,
  'duration': 120.0,
  'rate': 0.12,
  'down_payment': 100000.0},
 {'amount': 4637400.0, 'duration': 240.0, 'rate': 0.06, 'down_payment': 0.0},
 {'amount': 42900.0, 'duration': 90.0, 'rate': 0.07, 'down_payment': 8900.0},
 {'amount': 916000.0, 'duration': 16.0, 'rate': 0.13, 'down_payment': 0.0},
 {'amount': 45230.0, 'duration': 48.0, 'rate': 0.08, 'down_payment': 4300.0},
 {'amount': 991360.0, 'duration': 99.0, 'rate': 0.08, 'down_payment': 0.0},
 {'amount': 423000.0, 'duration': 27.0, 'rate': 0.09, 'down_payment': 47200.0}]

In [69]:
import math
def loan_emi(amount, duration,rate,down_payment =0):
    """calculate the equal monthly installments (EMI) for a loan.
    arguments:
        amount -Total Equal monthly installment (EMI for loan).
        duration -duration of the loan (moths)
        Rate-rate of Interest (Monthly)
        Down_payment (optional)-Optional Initial Payment (deducted from Amount)
    """
    loan_amount =amount -down_payment
    try:
        emi =loan_amount*rate*((1+rate)**duration)/(((1+rate)**duration)-1)
    except ZeroDivisionError:
        emi=loan_amount /duration
    emi=math.ceil(emi)
    return emi

In [63]:
def compute_emis(loans):
    for loan in loans:
        loan["emi"]=loan_emi(loan['amount'],loan["duration"],loan["rate"]/12,loan["down_payment"])
    return loans      
        

In [73]:
compute_emis(loans1)

[{'amount': 10000.0,
  'duration': 36.0,
  'rate': 0.08,
  'down_payment': 20000.0,
  'emi': -313},
 {'amount': 200000.0,
  'duration': 12.0,
  'rate': 0.1,
  'down_payment': 0.0,
  'emi': 17584},
 {'amount': 628400.0,
  'duration': 120.0,
  'rate': 0.12,
  'down_payment': 100000.0,
  'emi': 7582},
 {'amount': 4637400.0,
  'duration': 240.0,
  'rate': 0.06,
  'down_payment': 0.0,
  'emi': 33224},
 {'amount': 42900.0,
  'duration': 90.0,
  'rate': 0.07,
  'down_payment': 8900.0,
  'emi': 487},
 {'amount': 916000.0,
  'duration': 16.0,
  'rate': 0.13,
  'down_payment': 0.0,
  'emi': 62664},
 {'amount': 45230.0,
  'duration': 48.0,
  'rate': 0.08,
  'down_payment': 4300.0,
  'emi': 1000},
 {'amount': 991360.0,
  'duration': 99.0,
  'rate': 0.08,
  'down_payment': 0.0,
  'emi': 13712},
 {'amount': 423000.0,
  'duration': 27.0,
  'rate': 0.09,
  'down_payment': 47200.0,
  'emi': 15428}]

## Writing to files 
Now we have processed the data and it is important to write the results back to a file in CSV format. We can do this by creating/opening a file in write with open and using .write method of the file project. The string format method is useful

In [75]:
with open ('./data/emis1.txt', 'w') as f:
    for loan in loans1:
        f.write('{},{},{},{},{}\n'.format(
            loan["amount"],
            loan["duration"],
            loan["rate"],
            loan["down_payment"],
            loan["emi"]))

In [76]:
#let's check the file was created
os.listdir ('data')

['emis.txt',
 'emis1.txt',
 'emis2.txt',
 'loans1.txt',
 'loans2.txt',
 'loans3.txt']

In [77]:
with open ("./data/emis1.txt", 'r') as f:
    print(f.read())

10000.0,36.0,0.08,20000.0,-313
200000.0,12.0,0.1,0.0,17584
628400.0,120.0,0.12,100000.0,7582
4637400.0,240.0,0.06,0.0,33224
42900.0,90.0,0.07,8900.0,487
916000.0,16.0,0.13,0.0,62664
45230.0,48.0,0.08,4300.0,1000
991360.0,99.0,0.08,0.0,13712
423000.0,27.0,0.09,47200.0,15428



In [80]:
file_path="./data/emis.txt"
os.remove(file_path)

In [81]:
os.listdir('data')

['emis1.txt', 'loans1.txt', 'loans2.txt', 'loans3.txt']

## let define a generic function "write_CSV which takes a list of dictionaries and write in csv format, including headers in the first line. 

In [1]:
def write_csv(items, path):
    #open the file in write mode
    with open (path, 'w') as f:
        #return if there is nothing to write
        if len(items)==0:
            return
        #write the headers in the first line
        headers=list(items[0].keys())
        f.write(','.join(headers)+'\n')

        #write one item per line 
        for item in items:
            values =[]
            for header in headers:
                values.append(str(item.get(header,"")))
            f.write(','.join(values)+"\n")
            