### Interacting with the OS and filesystem

The `os` module in Python provides many functions for interacting with the OS and the filesystem. Let's import it and try out some examples.

In [1]:
import os

In [2]:
os.getcwd()

'/home/jovyan'

##### To get the list of files in a directory, use `os.listdir`. You pass an absolute or relative path of a directory as the argument to the function.

In [3]:
help(os.listdir)

Help on built-in function listdir in module posix:

listdir(path=None)
    Return a list containing the names of the files in the directory.
    
    path can be specified as either str, bytes, or a path-like object.  If path is bytes,
      the filenames returned will also be bytes; in all other circumstances
      the filenames returned will be str.
    If path is None, uses the path='.'.
    On some platforms, path may also be specified as an open file descriptor;\
      the file descriptor must refer to a directory.
      If this functionality is unavailable, using it raises NotImplementedError.
    
    The list is in arbitrary order.  It does not include the special
    entries '.' and '..' even if they are present in the directory.



##### relative path

In [4]:
os.listdir('.')   # relative path

['.bash_logout',
 '.bashrc',
 '.profile',
 '.local',
 '.ipython',
 '.jovianrc',
 '.cache',
 '.jupyter',
 '.jovian',
 '.ipynb_checkpoints',
 'Untitled.ipynb',
 '.npm',
 'work',
 '.git',
 '.wget-hsts',
 '.config',
 '.conda']

In [5]:
os.listdir('/usr')   # absolute path

['lib32',
 'sbin',
 'games',
 'lib64',
 'lib',
 'bin',
 'include',
 'libx32',
 'local',
 'share',
 'src']

##### You can create a new directory using os.makedirs. 

##### Let's create a new directory called data, where we'll later download some files.

In [6]:
os.makedirs('./data', exist_ok = True)

##### Can you figure out what the argument `exist_ok` does? Try using the `help` function or [read the documentation](https://docs.python.org/3/library/os.html#os.makedirs).

##### Let's verify that the directory was created and is currently empty.

In [7]:
'data' in os.listdir('.')

True

In [8]:
os.listdir('./data')

[]

##### Let us download some files into the data directory using the urllib module.

In [136]:
url_1 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans1.txt'
url_2 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans2.txt'
url_3 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans3.txt'

In [137]:
url_1

'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans1.txt'

In [138]:
# import 'urlretrieve' method from 'urllib.request' module.
from urllib.request import urlretrieve

In [139]:
urlretrieve(url_1, './data/loans_1.txt')

('./data/loans_1.txt', <http.client.HTTPMessage at 0x7f7026f98b20>)

In [140]:
urlretrieve(url_2, './data/loans_2.txt')

('./data/loans_2.txt', <http.client.HTTPMessage at 0x7f7026f98f10>)

In [141]:
urlretrieve(url_3, './data/loans_3.txt')

('./data/loans_3.txt', <http.client.HTTPMessage at 0x7f7026cca370>)

##### Let's verify that the files were downloaded.

In [142]:
os.listdir('./data')

['loans_3.txt', 'loans_1.txt', 'loans_2.txt']

You can also use the [`requests`](https://docs.python-requests.org/en/master/) library to dowload URLs, although you'll need to [write some additional code](https://stackoverflow.com/questions/44699682/how-to-save-a-file-downloaded-from-requests-to-another-directory) to save the contents of the page to a file.

### Reading from a file 

To read the contents of a file, we first need to open the file using the built-in `open` function. The `open` function returns a file object and provides several methods for interacting with the file's contents.

The `open` function also accepts a `mode` argument to specifies how we can interact with the file. The following options are supported:

```
    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================
```

To view the contents of the file, we can use the `read` method of the file object.

In [16]:
file_1 = open('./data/loans_1.txt', mode='r')

In [17]:
file_1_contents = file_1.read()

In [18]:
print(file_1_contents)

amount,duration,rate,down_payment
100000,36,0.08,20000
200000,12,0.1,
628400,120,0.12,100000
4637400,240,0.06,
42900,90,0.07,8900
916000,16,0.13,
45230,48,0.08,4300
991360,99,0.08,
423000,27,0.09,47200


In [19]:
file_1.close()

##### Once a file is closed, you can no longer read from it.

In [20]:
file_1.read()   # try to read the file, after close it.

ValueError: I/O operation on closed file.

### Closing files automatically using `with`

To close a file automatically after you've processed it, you can open it using the `with` statement.

In [21]:
with open('./data/loans_2.txt') as file2:
    file2_contents = file2.read()
    print(file2_contents)

amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300


##### Once the statements within the `with` block are executed, the `.close` method on `file2` is automatically invoked. 

##### Let's verify this by trying to read from the file object again.

In [22]:
file2.read()

ValueError: I/O operation on closed file.

### Reading a file line by line


File objects provide a `readlines` method to read a file line-by-line. 

In [26]:
with open('./data/loans_3.txt', 'r') as file3:
    file3_lines_list = file3.readlines()

In [28]:
file3_lines_list

['amount,duration,rate,down_payment\n',
 '45230,48,0.07,4300\n',
 '883000,16,0.14,\n',
 '100000,12,0.1,\n',
 '728400,120,0.12,100000\n',
 '3637400,240,0.06,\n',
 '82900,90,0.07,8900\n',
 '316000,16,0.13,\n',
 '15230,48,0.08,4300\n',
 '991360,99,0.08,\n',
 '323000,27,0.09,4720010000,36,0.08,20000\n',
 '528400,120,0.11,100000\n',
 '8633400,240,0.06,\n',
 '12900,90,0.08,8900']

### Processing data from files

Before performing any operations on the data stored in a file, we need to convert the file's contents from one large string into Python data types. For the file `loans1.txt` containing information about loans in a CSV format, we can do the following:

* Read the file line by line
* Parse the first line to get a list of the column names or headers
* Split each remaining line and convert each value into a float
* Create a dictionary for each loan using the headers as keys
* Create a list of dictionaries to keep track of all the loans

Since we will perform the same operations for multiple files, it would be useful to define a function `read_csv`. We'll also define some helper functions to build up the functionality step by step. 

Let's start by defining a function `parse_header` that takes a line as input and returns a list of column headers.

### writing `parse_headers` function:

In [29]:
def parse_headers(header_line):
    return header_line.strip().split(',')

# strip() method used to eleminate extra spaces/characters.
# split() method used to split the string.

In [30]:
# verifying the location of header row of loans_3.txt file.

file3_lines_list[0]

'amount,duration,rate,down_payment\n'

In [31]:
file3_lines_list[1]

'45230,48,0.07,4300\n'

In [32]:
file3_headers = parse_headers(file3_lines_list[0])

In [33]:
file3_headers

['amount', 'duration', 'rate', 'down_payment']

##### Next, let's define a function `parse_values` that takes a line containing some data and returns a list of floating-point numbers.

### writing `parse_values` function:

In [39]:
def parse_values(data_line):
    values = []
    for item in data_line.strip().split(','):
        values.append(float(item))
    
    return values

In [40]:
file3_lines_list[1]

'45230,48,0.07,4300\n'

In [41]:
parse_values(file3_lines_list[1])

[45230.0, 48.0, 0.07, 4300.0]

In [42]:
file3_lines_list[2]

'883000,16,0.14,\n'

In [43]:
parse_values(file3_lines_list[2])

ValueError: could not convert string to float: ''

The code above leads to a `ValueError` because the empty string `''` cannot be converted to a float. We can enhance the `parse_values` function to handle this *edge case*. 

We will also handle the case where the value is not a float.

### Debugging in `parse_values` function:

In [44]:
def parse_values(data_line):
    values = []
    for item in data_line.strip().split(','):
        if item == '':
            values.append(0.0)
        else:
            try:
                values.append(float(item))
            except ValueError:
                values.append(item)
    return values

In [45]:
file3_lines_list[1]

'45230,48,0.07,4300\n'

In [46]:
parse_values(file3_lines_list[1])

[45230.0, 48.0, 0.07, 4300.0]

In [47]:
file3_lines_list[2]

'883000,16,0.14,\n'

In [48]:
parse_values(file3_lines_list[2])

[883000.0, 16.0, 0.14, 0.0]

##### Next, let's define a function `create_item_dict` that takes a list of values and a list of headers as inputs and returns a dictionary with the values associated with their respective headers as keys.

### writing `create_item_dict` function

In [75]:
def create_item_dict(headers, values):
    result = {}
    for h,v in zip(headers, values):
        result[h] = v
    return result

In [76]:
for key, value in zip(['a','b','c'], [1, 2, 3]):
    print(key, ' = ', value)

a  =  1
b  =  2
c  =  3


In [77]:
for item in zip(['a', 'b', 'c'], [1, 2, 3]):
    print(item)

('a', 1)
('b', 2)
('c', 3)


In [78]:
file3_lines_list[1]

'45230,48,0.07,4300\n'

In [79]:
file3_value1 = parse_values(file3_lines_list[1])

In [80]:
file3_value1

[45230.0, 48.0, 0.07, 4300.0]

In [81]:
file3_headers

['amount', 'duration', 'rate', 'down_payment']

In [82]:
create_item_dict(file3_value1, file3_headers)

{45230.0: 'amount', 48.0: 'duration', 0.07: 'rate', 4300.0: 'down_payment'}

In [84]:
create_item_dict(values=file3_value1, 
                 headers=file3_headers)

{'amount': 45230.0, 'duration': 48.0, 'rate': 0.07, 'down_payment': 4300.0}

In [85]:
file3_lines_list[2]

'883000,16,0.14,\n'

In [86]:
file3_value2 = parse_values(file3_lines_list[2])

In [87]:
file3_value2

[883000.0, 16.0, 0.14, 0.0]

In [88]:
create_item_dict(file3_headers, file3_value2)

{'amount': 883000.0, 'duration': 16.0, 'rate': 0.14, 'down_payment': 0.0}

##### As expected, the values & header are combined to create a dictionary with the appropriate key-value pairs.

##### We are now ready to put it all together and define the `read_csv` function.

### writing `read_csv` function

In [None]:
def read_csv(path):
    result = []
    # open the file in read mode.
    with open(path, 'r') as f:
        lines = f.readlines()
        # parse the header through 'parse_headers' function
        headers = parse_headers(lines[0])
        # loop over the remaining lines to parse values
        for data_line in lines[1:]:
            # parse the list of values to clean & split
            values = parse_values(data_line)
            # create dictionary using 'header list' & 'values list'
            item_dict = create_item_dict(headers, values)
            # appending the dictionary to the 'result list'
            result.append(item_dict)
    return result

In [92]:
with open('./data/loans_2.txt') as f2:
    f2_contents = f2.read()
    print(f2_contents)

amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300


In [93]:
read_csv('./data/loans_2.txt')

[{'amount': 828400.0,
  'duration': 120.0,
  'rate': 0.11,
  'down_payment': 100000.0},
 {'amount': 4633400.0, 'duration': 240.0, 'rate': 0.06, 'down_payment': 0.0},
 {'amount': 42900.0, 'duration': 90.0, 'rate': 0.08, 'down_payment': 8900.0},
 {'amount': 983000.0, 'duration': 16.0, 'rate': 0.14, 'down_payment': 0.0},
 {'amount': 15230.0, 'duration': 48.0, 'rate': 0.07, 'down_payment': 4300.0}]

In [94]:
len(read_csv('./data/loans_2.txt'))

5

### Combining All helping functions

The file is read and converted to a list of dictionaries, as expected. The `read_csv` file is generic enough that it can parse any file in the CSV format, with any number of rows or columns. Here's the full code for `read_csv` along with the helper functions:

In [9]:
def parse_headers(header_line):
    return header_line.strip().split(',')

def parse_values(data_line):
    values = []
    for item in data_line.strip().split(','):
        if item == '':
            values.append(0.0)
        else:
            try:
                values.append(float(item))
            except ValueError:
                values.append(item)
    return values

def create_item_dict(headers, values):
    result = {}
    for h,v in zip(headers, values):
        result[h] = v
    return result

def read_csv(path):
    result = []
    # open the file in read mode.
    with open(path, 'r') as f:
        lines = f.readlines()
        # parse the header through 'parse_headers' function
        headers = parse_headers(lines[0])
        # loop over the remaining lines to parse values
        for data_line in lines[1:]:
            # parse the list of values to clean & split
            values = parse_values(data_line)
            # create dictionary using 'header list' & 'values list'
            item_dict = create_item_dict(headers, values)
            # appending the dictionary to the 'result list'
            result.append(item_dict)
    return result

In [96]:
read_csv('./data/loans_1.txt')

[{'amount': 100000.0, 'duration': 36.0, 'rate': 0.08, 'down_payment': 20000.0},
 {'amount': 200000.0, 'duration': 12.0, 'rate': 0.1, 'down_payment': 0.0},
 {'amount': 628400.0,
  'duration': 120.0,
  'rate': 0.12,
  'down_payment': 100000.0},
 {'amount': 4637400.0, 'duration': 240.0, 'rate': 0.06, 'down_payment': 0.0},
 {'amount': 42900.0, 'duration': 90.0, 'rate': 0.07, 'down_payment': 8900.0},
 {'amount': 916000.0, 'duration': 16.0, 'rate': 0.13, 'down_payment': 0.0},
 {'amount': 45230.0, 'duration': 48.0, 'rate': 0.08, 'down_payment': 4300.0},
 {'amount': 991360.0, 'duration': 99.0, 'rate': 0.08, 'down_payment': 0.0},
 {'amount': 423000.0, 'duration': 27.0, 'rate': 0.09, 'down_payment': 47200.0}]

### writing `emi` function

In [97]:
import math

def loan_emi(amount, duration, rate, down_payment = 0):
    loan_amount = amount - down_payment
    try:
        emi = loan_amount*rate*((1+rate)**duration)/(((1+rate)**duration)-1)
    except ZeroDivisionError:
        emi = loan_amount / duration
    emi = math.ceil(emi)
    return emi

In [98]:
loans_2 = read_csv('./data/loans_2.txt')

In [99]:
loans_2

[{'amount': 828400.0,
  'duration': 120.0,
  'rate': 0.11,
  'down_payment': 100000.0},
 {'amount': 4633400.0, 'duration': 240.0, 'rate': 0.06, 'down_payment': 0.0},
 {'amount': 42900.0, 'duration': 90.0, 'rate': 0.08, 'down_payment': 8900.0},
 {'amount': 983000.0, 'duration': 16.0, 'rate': 0.14, 'down_payment': 0.0},
 {'amount': 15230.0, 'duration': 48.0, 'rate': 0.07, 'down_payment': 4300.0}]

### running a loop to add `emi` column to dictionary

In [None]:
for loan in loans_2:
    loan['emi'] = loan_emi(amount = loan['amount'],
                          down_payment=loan['down_payment'],
                           rate = loan['rate'],
                           duration = loan['duration']
                    )

In [101]:
loans_2

[{'amount': 828400.0,
  'duration': 120.0,
  'rate': 0.11,
  'down_payment': 100000.0,
  'emi': 80125},
 {'amount': 4633400.0,
  'duration': 240.0,
  'rate': 0.06,
  'down_payment': 0.0,
  'emi': 278005},
 {'amount': 42900.0,
  'duration': 90.0,
  'rate': 0.08,
  'down_payment': 8900.0,
  'emi': 2723},
 {'amount': 983000.0,
  'duration': 16.0,
  'rate': 0.14,
  'down_payment': 0.0,
  'emi': 156902},
 {'amount': 15230.0,
  'duration': 48.0,
  'rate': 0.07,
  'down_payment': 4300.0,
  'emi': 797}]

You can see that each loan now has a new key `emi`, which provides the EMI for the loan. We can extract this logic into a function so that we can use it for other files too.

### writing `compute_emis` function for all files

In [102]:
def compute_emis(loans):
    for loan in loans:
        # calling 'loan_emi' function
        loan['emi'] = loan_emi(
            amount = loan['amount'],
            duration = loan['duration'],
            down_payment=loan['down_payment'],
            rate = loan['rate']
        )

## Writing to files

Now that we have performed some processing on the data, it would be good to write the results back to a CSV file. 

We can create/open a file in `w` mode using `open` and write to it using the `.write` method. 

The string `format` method will come in handy here.

In [103]:
loans2 = read_csv('./data/loans_2.txt')

In [104]:
loans2

[{'amount': 828400.0,
  'duration': 120.0,
  'rate': 0.11,
  'down_payment': 100000.0},
 {'amount': 4633400.0, 'duration': 240.0, 'rate': 0.06, 'down_payment': 0.0},
 {'amount': 42900.0, 'duration': 90.0, 'rate': 0.08, 'down_payment': 8900.0},
 {'amount': 983000.0, 'duration': 16.0, 'rate': 0.14, 'down_payment': 0.0},
 {'amount': 15230.0, 'duration': 48.0, 'rate': 0.07, 'down_payment': 4300.0}]

In [105]:
compute_emis(loans2)

In [106]:
loans2

[{'amount': 828400.0,
  'duration': 120.0,
  'rate': 0.11,
  'down_payment': 100000.0,
  'emi': 80125},
 {'amount': 4633400.0,
  'duration': 240.0,
  'rate': 0.06,
  'down_payment': 0.0,
  'emi': 278005},
 {'amount': 42900.0,
  'duration': 90.0,
  'rate': 0.08,
  'down_payment': 8900.0,
  'emi': 2723},
 {'amount': 983000.0,
  'duration': 16.0,
  'rate': 0.14,
  'down_payment': 0.0,
  'emi': 156902},
 {'amount': 15230.0,
  'duration': 48.0,
  'rate': 0.07,
  'down_payment': 4300.0,
  'emi': 797}]

In [107]:
os.listdir('./data')

['loans_3.txt', 'loans_1.txt', 'loans_2.txt']

In [108]:
with open('./data/loans_2.txt') as f:
    print(f.read())

amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300


In [109]:
loans2

[{'amount': 828400.0,
  'duration': 120.0,
  'rate': 0.11,
  'down_payment': 100000.0,
  'emi': 80125},
 {'amount': 4633400.0,
  'duration': 240.0,
  'rate': 0.06,
  'down_payment': 0.0,
  'emi': 278005},
 {'amount': 42900.0,
  'duration': 90.0,
  'rate': 0.08,
  'down_payment': 8900.0,
  'emi': 2723},
 {'amount': 983000.0,
  'duration': 16.0,
  'rate': 0.14,
  'down_payment': 0.0,
  'emi': 156902},
 {'amount': 15230.0,
  'duration': 48.0,
  'rate': 0.07,
  'down_payment': 4300.0,
  'emi': 797}]

In [111]:
with open('./data/loans_2.txt', 'w') as f:
    for loan in loans2:
        f.write('{},{},{},{},{}\n'.format(
            loan['amount'],
            loan['duration'],
            loan['rate'],
            loan['down_payment'],
            loan['emi']
        ))

In [112]:
os.listdir('./data')

['loans_3.txt', 'loans_1.txt', 'loans_2.txt']

In [113]:
with open('./data/loans_2.txt') as f:
    print(f.read())

828400.0,120.0,0.11,100000.0,80125
4633400.0,240.0,0.06,0.0,278005
42900.0,90.0,0.08,8900.0,2723
983000.0,16.0,0.14,0.0,156902
15230.0,48.0,0.07,4300.0,797



Great, looks like the loan details (along with the computed EMIs) were written into the file.

##### Let's define a generic function `write_csv` which takes a list of dictionaries and writes it to a file in CSV format. We will also include the column headers in the first line.

In [23]:
# here "items" is a "list of dictionaries".
def write_csv(items, path):
    # open the file in write mode.
    with open(path, 'w') as f:
        if len(items) == 0:
            return
        
        # firstly, write the header in the file.
        headers = list(items[0].keys())
        f.write(','.join(headers)+'\n')
        
        # now, write one item per line.
        for item in items:
            values = []
            for h in headers:
                values.append(str(item.get(h, '')))
            f.write(','.join(values) + '\n')

Do you understand how the function works? If now, try executing each statement by line by line or a different cell to figure out how it works. 

Let's try it out!

In [115]:
loans3 = read_csv('./data/loans_3.txt')

In [117]:
loans3

[{'amount': 45230.0, 'duration': 48.0, 'rate': 0.07, 'down_payment': 4300.0},
 {'amount': 883000.0, 'duration': 16.0, 'rate': 0.14, 'down_payment': 0.0},
 {'amount': 100000.0, 'duration': 12.0, 'rate': 0.1, 'down_payment': 0.0},
 {'amount': 728400.0,
  'duration': 120.0,
  'rate': 0.12,
  'down_payment': 100000.0},
 {'amount': 3637400.0, 'duration': 240.0, 'rate': 0.06, 'down_payment': 0.0},
 {'amount': 82900.0, 'duration': 90.0, 'rate': 0.07, 'down_payment': 8900.0},
 {'amount': 316000.0, 'duration': 16.0, 'rate': 0.13, 'down_payment': 0.0},
 {'amount': 15230.0, 'duration': 48.0, 'rate': 0.08, 'down_payment': 4300.0},
 {'amount': 991360.0, 'duration': 99.0, 'rate': 0.08, 'down_payment': 0.0},
 {'amount': 323000.0,
  'duration': 27.0,
  'rate': 0.09,
  'down_payment': 4720010000.0},
 {'amount': 528400.0,
  'duration': 120.0,
  'rate': 0.11,
  'down_payment': 100000.0},
 {'amount': 8633400.0, 'duration': 240.0, 'rate': 0.06, 'down_payment': 0.0},
 {'amount': 12900.0, 'duration': 90.0, '

In [118]:
compute_emis(loans3)

In [119]:
loans3

[{'amount': 45230.0,
  'duration': 48.0,
  'rate': 0.07,
  'down_payment': 4300.0,
  'emi': 2981},
 {'amount': 883000.0,
  'duration': 16.0,
  'rate': 0.14,
  'down_payment': 0.0,
  'emi': 140941},
 {'amount': 100000.0,
  'duration': 12.0,
  'rate': 0.1,
  'down_payment': 0.0,
  'emi': 14677},
 {'amount': 728400.0,
  'duration': 120.0,
  'rate': 0.12,
  'down_payment': 100000.0,
  'emi': 75409},
 {'amount': 3637400.0,
  'duration': 240.0,
  'rate': 0.06,
  'down_payment': 0.0,
  'emi': 218245},
 {'amount': 82900.0,
  'duration': 90.0,
  'rate': 0.07,
  'down_payment': 8900.0,
  'emi': 5192},
 {'amount': 316000.0,
  'duration': 16.0,
  'rate': 0.13,
  'down_payment': 0.0,
  'emi': 47851},
 {'amount': 15230.0,
  'duration': 48.0,
  'rate': 0.08,
  'down_payment': 4300.0,
  'emi': 897},
 {'amount': 991360.0,
  'duration': 99.0,
  'rate': 0.08,
  'down_payment': 0.0,
  'emi': 79348},
 {'amount': 323000.0,
  'duration': 27.0,
  'rate': 0.09,
  'down_payment': 4720010000.0,
  'emi': -4707175

In [120]:
write_csv(loans3, './data/loans_3.txt')

In [121]:
with open('./data/loans_3.txt', 'r') as f:
    print(f.read())

amount,duration,rate,down_payment,emi
45230.0,48.0,0.07,4300.0,2981
883000.0,16.0,0.14,0.0,140941
100000.0,12.0,0.1,0.0,14677
728400.0,120.0,0.12,100000.0,75409
3637400.0,240.0,0.06,0.0,218245
82900.0,90.0,0.07,8900.0,5192
316000.0,16.0,0.13,0.0,47851
15230.0,48.0,0.08,4300.0,897
991360.0,99.0,0.08,0.0,79348
323000.0,27.0,0.09,4720010000.0,-470717536
528400.0,120.0,0.11,100000.0,47125
8633400.0,240.0,0.06,0.0,518005
12900.0,90.0,0.08,8900.0,321



#### With just four lines of code, we can now read each downloaded file, calculate the EMIs, and write the results back to new files:

In [143]:
for i in range(1, 4):
#     path = './data/loans_{}.txt'.format(i)
#     print(path)
    loans = read_csv('./data/loans_{}.txt'.format(i))
#     print(loans)
    compute_emis(loans)
    write_csv(loans, './data/emis_{}.txt'.format(i))

In [144]:
os.listdir('./data')

['loans_3.txt',
 'loans_1.txt',
 'emis_2.txt',
 'loans_2.txt',
 'emis_3.txt',
 'emis_1.txt']

### Using Pandas to Read and Write CSVs

There are some limitations to the `read_csv` and `write_csv` functions we've defined above:

* The `read_csv` function fails to create a proper dictionary if any of the values in the CSV files contains commas
* The `write_csv` function fails to create a proper CSV if any of the values to be written contains commas

When a value in a CSV file contains a comma (`,`), the value is generally placed within double quotes. Double quotes (`"`) in values are converted into two double quotes (`""`). Here's an example:

```
title,description
Fast & Furious,"A movie, a race, a franchise"
The Dark Knight,"Gotham, the ""Batman"", and the Joker"
Memento,A guy forgets everything every 15 minutes

```

Let's try it out.

In [1]:
movies_url = "https://gist.githubusercontent.com/aakashns/afee0a407d44bbc02321993548021af9/raw/6d7473f0ac4c54aca65fc4b06ed831b8a4840190/movies.csv"

In [2]:
from urllib.request import urlretrieve

In [4]:
import os
os.makedirs('./data')

In [6]:
'data' in os.listdir('.')

True

In [5]:
os.listdir('./data')

[]

In [7]:
urlretrieve(movies_url, './data/movies.csv')

('./data/movies.csv', <http.client.HTTPMessage at 0x7fbf457d14c0>)

In [10]:
movies = read_csv('./data/movies.csv')

In [11]:
movies

[{'title': 'Fast & Furious', 'description': '"A movie'},
 {'title': 'The Dark Knight', 'description': '"Gotham'},
 {'title': 'Memento',
  'description': 'A guy forgets everything every 15 minutes'}]

#### As you can seen above, the movie descriptions weren't parsed properly.

#### To read this CSV properly, we can use the `pandas` library.

In [19]:
!pip install pandas --upgrade --quiet

In [13]:
import pandas as pd

The `pd.read_csv` function can be used to read the CSV file into a pandas data frame: a spreadsheet-like object for analyzing and processing data. We'll learn more about data frames in a future lesson.

In [14]:
movies_data_fram = pd.read_csv('./data/movies.csv')

In [15]:
movies_data_fram

Unnamed: 0,title,description
0,Fast & Furious,"A movie, a race, a franchise"
1,The Dark Knight,"Gotham, the ""Batman"", and the Joker"
2,Memento,A guy forgets everything every 15 minutes


#### A dataframe can be converted into a list of dictionaries using the `to_dict` method.

In [17]:
movies = movies_data_fram.to_dict('records')

In [18]:
movies

[{'title': 'Fast & Furious', 'description': 'A movie, a race, a franchise'},
 {'title': 'The Dark Knight',
  'description': 'Gotham, the "Batman", and the Joker'},
 {'title': 'Memento',
  'description': 'A guy forgets everything every 15 minutes'}]

#### If you don't pass the arguments `records`, you get a dictionary of lists instead.

In [20]:
movies_dict = movies_data_fram.to_dict()

In [21]:
movies_dict

{'title': {0: 'Fast & Furious', 1: 'The Dark Knight', 2: 'Memento'},
 'description': {0: 'A movie, a race, a franchise',
  1: 'Gotham, the "Batman", and the Joker',
  2: 'A guy forgets everything every 15 minutes'}}

#### Let's try using the `write_csv` function to write the data in `movies` back to a CSV file.

In [24]:
write_csv(movies, './data/movies_2.csv')

In [25]:
with open('./data/movies_2.csv', 'r') as f:
    print(f.read())

title,description
Fast & Furious,A movie, a race, a franchise
The Dark Knight,Gotham, the "Batman", and the Joker
Memento,A guy forgets everything every 15 minutes



#### As you can see above, the CSV file is not formatted properly. This can be verified by attempting to read the file using `pd.read_csv`.

In [26]:
pd.read_csv('./data/movies_2.csv')

Unnamed: 0,Unnamed: 1,title,description
Fast & Furious,A movie,a race,a franchise
The Dark Knight,Gotham,"the ""Batman""",and the Joker
Memento,A guy forgets everything every 15 minutes,,


#### To convert a list of dictionaries into a dataframe, you can use the `pd.DataFrame` constructor.

In [27]:
movies

[{'title': 'Fast & Furious', 'description': 'A movie, a race, a franchise'},
 {'title': 'The Dark Knight',
  'description': 'Gotham, the "Batman", and the Joker'},
 {'title': 'Memento',
  'description': 'A guy forgets everything every 15 minutes'}]

In [28]:
df2 = pd.DataFrame(movies)

In [29]:
df2

Unnamed: 0,title,description
0,Fast & Furious,"A movie, a race, a franchise"
1,The Dark Knight,"Gotham, the ""Batman"", and the Joker"
2,Memento,A guy forgets everything every 15 minutes


#### It can now be written to a CSV file using the `.to_csv` method of a dataframe.

In [30]:
df2.to_csv('./data/movies_3.csv', index = None)

In [32]:
with open('./data/movies_3.csv') as f:
    print(f.read())

title,description
Fast & Furious,"A movie, a race, a franchise"
The Dark Knight,"Gotham, the ""Batman"", and the Joker"
Memento,A guy forgets everything every 15 minutes



#### The CSV file is formatted properly. We can verify this by trying to read it back.

In [33]:
pd.read_csv('./data/movies_3.csv')

Unnamed: 0,title,description
0,Fast & Furious,"A movie, a race, a franchise"
1,The Dark Knight,"Gotham, the ""Batman"", and the Joker"
2,Memento,A guy forgets everything every 15 minutes


## Exercise - Processing CSV files using a dictionary of lists

We defined the functions `read_csv` and `write_csv` above to convert a CSV file into a list of dictionaries and vice versa. In this exercise, you'll transform the CSV data into a dictionary of lists instead, with one list for each column in the file.

For example, consider the following CSV file:

```
amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300
```

We'll convert it into the following dictionary of lists:

```
{
  amount: [828400, 4633400, 42900, 983000, 15230],
  duration: []120, 240, 90, 16, 48],
  rate: [0.11, 0.06, 0.08, 0.14, 0.07],
  down_payment: [100000, 0, 8900, 0, 4300]
}
```

Complete the following tasks using the empty cells below:

1. Download three CSV files to the folder `data2` using the URLs listed in the code cell below, and verify the downloaded files.
2. Define a function `read_csv_columnar` that reads a CSV file and returns a dictionary of lists in the format shown above. 
3. Define a function `compute_emis` that adds another key `emi` into the dictionary with a list of EMIs computed for each row of data.
4. Define a function `write_csv_columnar` that writes the data from the dictionary of lists into a correctly formatted CSV file.
5. Process all three downloaded files and write the results by creating new files in the directory `data2`.

Define helper functions wherever required.

### Q.1: Download three CSV files to the folder data2 using the URLs listed in the code cell below, and verify the downloaded files.

In [1]:
url1 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans1.txt'
url2 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans2.txt'
url3 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans3.txt'

In [2]:
import os

In [3]:
os.makedirs('./data2')

In [4]:
os.listdir('./data2')

[]

In [5]:
from urllib.request import urlretrieve

In [6]:
urlretrieve(url1, './data2/loan_1.txt')

('./data2/loan_1.txt', <http.client.HTTPMessage at 0x7f43a2340e50>)

In [7]:
urlretrieve(url2, './data2/loan_2.txt')

('./data2/loan_2.txt', <http.client.HTTPMessage at 0x7f43a2340cd0>)

In [8]:
urlretrieve(url3, './data2/loan_3.txt')

('./data2/loan_3.txt', <http.client.HTTPMessage at 0x7f43a02d3040>)

In [9]:
os.listdir('./data2')

['loan_3.txt', 'loan_2.txt', 'loan_1.txt']

In [10]:
import pandas as pd
df1 = pd.read_csv('./data2/loan_1.txt')

In [11]:
df1

Unnamed: 0,amount,duration,rate,down_payment
0,100000,36,0.08,20000.0
1,200000,12,0.1,
2,628400,120,0.12,100000.0
3,4637400,240,0.06,
4,42900,90,0.07,8900.0
5,916000,16,0.13,
6,45230,48,0.08,4300.0
7,991360,99,0.08,
8,423000,27,0.09,47200.0


In [12]:
df1_dict = df1.to_dict('list')

In [13]:
df1_dict

{'amount': [100000,
  200000,
  628400,
  4637400,
  42900,
  916000,
  45230,
  991360,
  423000],
 'duration': [36, 12, 120, 240, 90, 16, 48, 99, 27],
 'rate': [0.08, 0.1, 0.12, 0.06, 0.07, 0.13, 0.08, 0.08, 0.09],
 'down_payment': [20000.0,
  nan,
  100000.0,
  nan,
  8900.0,
  nan,
  4300.0,
  nan,
  47200.0]}

In [14]:
len(df1_dict['amount'])

9

In [15]:
os.listdir('./data2')

['loan_3.txt', 'loan_2.txt', 'loan_1.txt']

### Q.2: Define a function read_csv_columnar that reads a CSV file and returns a dictionary of lists in the format shown above.

In [16]:
def read_csv_columnar(path):
    df = pd.read_csv(path)
    return df.to_dict('list')

In [17]:
# directly reading the data from the file.
read_csv_columnar('./data2/loan_1.txt')

{'amount': [100000,
  200000,
  628400,
  4637400,
  42900,
  916000,
  45230,
  991360,
  423000],
 'duration': [36, 12, 120, 240, 90, 16, 48, 99, 27],
 'rate': [0.08, 0.1, 0.12, 0.06, 0.07, 0.13, 0.08, 0.08, 0.09],
 'down_payment': [20000.0,
  nan,
  100000.0,
  nan,
  8900.0,
  nan,
  4300.0,
  nan,
  47200.0]}

### Q.3: Define a function compute_emis that adds another key emi into the dictionary with a list of EMIs computed for each row of data.

In [18]:
def compute_emis(path):
    
    df_dict = read_csv_columnar(path)
#     print("Check the data frame columns before calling 'compute_emis' function:\n")
#     print(df_dict)
#     print('--------------------------------------------------------')

    # writing 'compute_emi' function inside 'compute_emis' function.
    import math
    def compute_emi(amount, duration, rate, down_payment = 0):
        loan_amount = amount - down_payment
        try:
            emi = loan_amount*rate*((1+rate)**duration)/(((1+rate)**duration)-1)
        except ZeroDivisionError:
            emi = loan_amount / duration
        emi = math.ceil(emi)
        return emi
    # end of internal function.
    
    emi_list = []
    limit = len(df_dict['amount'])
    for i in range(limit):
        try:
            each_emi = compute_emi(df_dict['amount'][i], df_dict['duration'][i], df_dict['rate'][i], df_dict['down_payment'][i])
        except ValueError:
            df_dict['down_payment'][i] = 0
            each_emi = compute_emi(df_dict['amount'][i], df_dict['duration'][i], df_dict['rate'][i], df_dict['down_payment'][i])
        emi_list.append(each_emi)
    df_dict['emi'] = emi_list
#     print('EMI column is added into data frame:')
    return df_dict

In [19]:
dict_1 = compute_emis('./data2/loan_1.txt')

In [20]:
dict_1

{'amount': [100000,
  200000,
  628400,
  4637400,
  42900,
  916000,
  45230,
  991360,
  423000],
 'duration': [36, 12, 120, 240, 90, 16, 48, 99, 27],
 'rate': [0.08, 0.1, 0.12, 0.06, 0.07, 0.13, 0.08, 0.08, 0.09],
 'down_payment': [20000.0, 0, 100000.0, 0, 8900.0, 0, 4300.0, 0, 47200.0],
 'emi': [6828, 29353, 63409, 278245, 2386, 138707, 3358, 79348, 37481]}

In [21]:
len(dict_1)

5

In [22]:
len(dict_1.keys())

5

In [23]:
list(dict_1.keys())[0]

'amount'

In [24]:
len(dict_1['amount'])

9

In [25]:
dict_1.keys()

dict_keys(['amount', 'duration', 'rate', 'down_payment', 'emi'])

In [26]:
list(dict_1.keys())

['amount', 'duration', 'rate', 'down_payment', 'emi']

In [27]:
list_string = ','.join(list(dict_1.keys()))

In [28]:
list_string

'amount,duration,rate,down_payment,emi'

In [29]:
with open('./data2/loan_1.txt') as f:
    print(f.read())

amount,duration,rate,down_payment
100000,36,0.08,20000
200000,12,0.1,
628400,120,0.12,100000
4637400,240,0.06,
42900,90,0.07,8900
916000,16,0.13,
45230,48,0.08,4300
991360,99,0.08,
423000,27,0.09,47200


### Q.4: Define a function write_csv_columnar that writes the data from the dictionary of lists into a correctly formatted CSV file.

In [122]:
def write_csv_columnar(passing_dict):
    import csv
    
    with open('./data2/loan_1.txt', 'w') as f:
        # creating csv writer object.
        writer_object = csv.writer(f)
        
        # writing dictionary keys as headers of csv file.
        writer_object.writerow(passing_dict.keys())
        
        # writing list of dictionaries.
        writer_object.writerows(zip(*passing_dict.values()))
        

In [123]:
write_csv_columnar(dict_1)

In [124]:
read_csv_columnar('./data2/loan_1.txt')

{'amount': [100000,
  200000,
  628400,
  4637400,
  42900,
  916000,
  45230,
  991360,
  423000],
 'duration': [36, 12, 120, 240, 90, 16, 48, 99, 27],
 'rate': [0.08, 0.1, 0.12, 0.06, 0.07, 0.13, 0.08, 0.08, 0.09],
 'down_payment': [20000.0,
  0.0,
  100000.0,
  0.0,
  8900.0,
  0.0,
  4300.0,
  0.0,
  47200.0],
 'emi': [6828, 29353, 63409, 278245, 2386, 138707, 3358, 79348, 37481]}

In [125]:
df1 = pd.read_csv('./data2/loan_1.txt')

In [126]:
df1

Unnamed: 0,amount,duration,rate,down_payment,emi
0,100000,36,0.08,20000.0,6828
1,200000,12,0.1,0.0,29353
2,628400,120,0.12,100000.0,63409
3,4637400,240,0.06,0.0,278245
4,42900,90,0.07,8900.0,2386
5,916000,16,0.13,0.0,138707
6,45230,48,0.08,4300.0,3358
7,991360,99,0.08,0.0,79348
8,423000,27,0.09,47200.0,37481


In [30]:
dict_1

{'amount': [100000,
  200000,
  628400,
  4637400,
  42900,
  916000,
  45230,
  991360,
  423000],
 'duration': [36, 12, 120, 240, 90, 16, 48, 99, 27],
 'rate': [0.08, 0.1, 0.12, 0.06, 0.07, 0.13, 0.08, 0.08, 0.09],
 'down_payment': [20000.0, 0, 100000.0, 0, 8900.0, 0, 4300.0, 0, 47200.0],
 'emi': [6828, 29353, 63409, 278245, 2386, 138707, 3358, 79348, 37481]}

In [32]:
os.listdir('./data2')

['loan_3.txt', 'loan_2.txt', 'loan_1.txt']

In [108]:
def write_csv_columnar(dict_of_lists, path):
#     dict_of_lists = read_csv_columnar(path)
    with open(path, 'w') as f:
        headers = list(dict_of_lists.keys())
        f.write(','.join(headers) + '\n')
        
        range_1 = len(dict_of_lists)
        range_2 = len(list(dict_of_lists.values())[0])
        values_list = []
        for j in range(range_2):
            for i in range(range_1):
                values_list.append(str(list(dict_of_lists.values())[i][j]))
                
            f.write(','.join(values_list) + '\n')
            values_list = []
#                 print(list(dict_1.values())[i][j])
        
            
            

In [109]:
write_csv_columnar(dict_1, './data2/loan_1.txt')

In [111]:
pd.read_csv('./data2/loan_1.txt')

Unnamed: 0,amount,duration,rate,down_payment,emi
0,100000,36,0.08,20000.0,6828
1,200000,12,0.1,0.0,29353
2,628400,120,0.12,100000.0,63409
3,4637400,240,0.06,0.0,278245
4,42900,90,0.07,8900.0,2386
5,916000,16,0.13,0.0,138707
6,45230,48,0.08,4300.0,3358
7,991360,99,0.08,0.0,79348
8,423000,27,0.09,47200.0,37481


In [100]:
list_1 = []

In [101]:
list_1.append(2)

In [102]:
list_1

[2]

In [34]:
list(dict_1.keys())

['amount', 'duration', 'rate', 'down_payment', 'emi']

In [47]:
abc = str(list(dict_1.keys())[0])
abc

'amount'

In [51]:
list(dict_1.values())

[[100000, 200000, 628400, 4637400, 42900, 916000, 45230, 991360, 423000],
 [36, 12, 120, 240, 90, 16, 48, 99, 27],
 [0.08, 0.1, 0.12, 0.06, 0.07, 0.13, 0.08, 0.08, 0.09],
 [20000.0, 0, 100000.0, 0, 8900.0, 0, 4300.0, 0, 47200.0],
 [6828, 29353, 63409, 278245, 2386, 138707, 3358, 79348, 37481]]

In [54]:
abc = len(list(dict_1.values())[0])
abc

9

In [58]:
list(dict_1.values())[0]

[100000, 200000, 628400, 4637400, 42900, 916000, 45230, 991360, 423000]

In [78]:
list(dict_1.values())[1]

[36, 12, 120, 240, 90, 16, 48, 99, 27]

In [79]:
list(dict_1.values())[2]

[0.08, 0.1, 0.12, 0.06, 0.07, 0.13, 0.08, 0.08, 0.09]

In [81]:
list(dict_1.values())[3]

[20000.0, 0, 100000.0, 0, 8900.0, 0, 4300.0, 0, 47200.0]

In [82]:
list(dict_1.values())[4]

[6828, 29353, 63409, 278245, 2386, 138707, 3358, 79348, 37481]

In [105]:
my_list = []
for i in range(len(list(dict_1.values())[0])):
    for j in range(len(dict_1)):
        my_list.append(list(dict_1.values())[j][i])
    print(my_list, '\n')
    my_list = []
#         print(list(dict_1.values())[j][i])

[100000, 36, 0.08, 20000.0, 6828] 

[200000, 12, 0.1, 0, 29353] 

[628400, 120, 0.12, 100000.0, 63409] 

[4637400, 240, 0.06, 0, 278245] 

[42900, 90, 0.07, 8900.0, 2386] 

[916000, 16, 0.13, 0, 138707] 

[45230, 48, 0.08, 4300.0, 3358] 

[991360, 99, 0.08, 0, 79348] 

[423000, 27, 0.09, 47200.0, 37481] 



In [59]:
list(dict_1.values())[0][0]

100000

In [55]:
for i in range(abc):
    print(i)

0
1
2
3
4
5
6
7
8


In [154]:
url3 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans3.txt'

In [155]:
from urllib.request import urlretrieve

In [156]:
import os

In [161]:
os.makedirs('./data', exist_ok=True)

In [162]:
os.listdir('./data')

[]

In [163]:
urlretrieve(url3, './data/loan_3.txt')

('./data/loan_3.txt', <http.client.HTTPMessage at 0x7f43576b77f0>)

In [164]:
with open('./data/loan_3.txt') as f:
    print(f.read())

amount,duration,rate,down_payment
45230,48,0.07,4300
883000,16,0.14,
100000,12,0.1,
728400,120,0.12,100000
3637400,240,0.06,
82900,90,0.07,8900
316000,16,0.13,
15230,48,0.08,4300
991360,99,0.08,
323000,27,0.09,4720010000,36,0.08,20000
528400,120,0.11,100000
8633400,240,0.06,
12900,90,0.08,8900


In [165]:
import pandas as pd

In [166]:
pd.read_csv('./data/loan_3.txt')

ParserError: Error tokenizing data. C error: Expected 4 fields in line 11, saw 7


### Q.5: Process all three downloaded files and write the results by creating new files in the directory `data_folder`.

In [140]:
url1 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans1.txt'
url2 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans2.txt'
url3 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans3.txt'

import os
os.makedirs('./data_folder', exist_ok=True)

from urllib.request import urlretrieve

urlretrieve(url1, './data_folder/loan_1.txt')
urlretrieve(url2, './data_folder/loan_2.txt')
urlretrieve(url3, './data_folder/loan_3.txt')

os.listdir('./data_folder')

['loan_3.txt', 'loan_2.txt', 'loan_1.txt']

In [141]:
# write the function to read the csv file and show into the form of dictionary of lists.
def read_csv_columnar(path):
    df = pd.read_csv(path)
    return df.to_dict('list')
# end of 'read_csv_columnar' function.


In [142]:
# write the function to compute emis for all customers in the file.
def compute_emis(path):    
    df_dict = read_csv_columnar(path)

    # writing 'compute_emi' function inside 'compute_emis' function.
    import math
    def compute_emi(amount, duration, rate, down_payment = 0):
        loan_amount = amount - down_payment
        try:
            emi = loan_amount*rate*((1+rate)**duration)/(((1+rate)**duration)-1)
        except ZeroDivisionError:
            emi = loan_amount / duration
        emi = math.ceil(emi)
        return emi
    # end of internal function.
    
    emi_list = []
    limit = len(df_dict['amount'])
    for i in range(limit):
        try:
            each_emi = compute_emi(df_dict['amount'][i], df_dict['duration'][i], df_dict['rate'][i], df_dict['down_payment'][i])
        except ValueError:
            df_dict['down_payment'][i] = 0
            each_emi = compute_emi(df_dict['amount'][i], df_dict['duration'][i], df_dict['rate'][i], df_dict['down_payment'][i])
        emi_list.append(each_emi)
    df_dict['emi'] = emi_list

    return df_dict
# end of 'compute_emis' function.

In [143]:
# write the function to edit the files, to add 'emi' column into file
def write_csv_columnar(path):
    dict_of_lists = compute_emis(path)
    with open(path, 'w') as f:
        headers = list(dict_of_lists.keys())
        f.write(','.join(headers) + '\n')
        
        range_1 = len(dict_of_lists)
        range_2 = len(list(dict_of_lists.values())[0])
        values_list = []
        for j in range(range_2):
            for i in range(range_1):
                values_list.append(str(list(dict_of_lists.values())[i][j]))
                
            f.write(','.join(values_list) + '\n')
            values_list = []
# end of 'write_csv_columnar' function.

In [152]:
path1 = './data_folder/loan_1.txt'
read_csv_columnar(path1)

{'amount': [100000,
  200000,
  628400,
  4637400,
  42900,
  916000,
  45230,
  991360,
  423000],
 'duration': [36, 12, 120, 240, 90, 16, 48, 99, 27],
 'rate': [0.08, 0.1, 0.12, 0.06, 0.07, 0.13, 0.08, 0.08, 0.09],
 'down_payment': [20000.0,
  0.0,
  100000.0,
  0.0,
  8900.0,
  0.0,
  4300.0,
  0.0,
  47200.0],
 'emi': [6828, 29353, 63409, 278245, 2386, 138707, 3358, 79348, 37481]}

In [145]:
path1 = './data_folder/loan_1.txt'
compute_emis(path1)

{'amount': [100000,
  200000,
  628400,
  4637400,
  42900,
  916000,
  45230,
  991360,
  423000],
 'duration': [36, 12, 120, 240, 90, 16, 48, 99, 27],
 'rate': [0.08, 0.1, 0.12, 0.06, 0.07, 0.13, 0.08, 0.08, 0.09],
 'down_payment': [20000.0, 0, 100000.0, 0, 8900.0, 0, 4300.0, 0, 47200.0],
 'emi': [6828, 29353, 63409, 278245, 2386, 138707, 3358, 79348, 37481]}

In [146]:
path1 = './data_folder/loan_1.txt'
write_csv_columnar(path1)

In [147]:
pd.read_csv(path1)

Unnamed: 0,amount,duration,rate,down_payment,emi
0,100000,36,0.08,20000.0,6828
1,200000,12,0.1,0.0,29353
2,628400,120,0.12,100000.0,63409
3,4637400,240,0.06,0.0,278245
4,42900,90,0.07,8900.0,2386
5,916000,16,0.13,0.0,138707
6,45230,48,0.08,4300.0,3358
7,991360,99,0.08,0.0,79348
8,423000,27,0.09,47200.0,37481


In [150]:
path2 = './data_folder/loan_2.txt'
write_csv_columnar(path2)

In [151]:
pd.read_csv(path2)

Unnamed: 0,amount,duration,rate,down_payment,emi
0,828400,120,0.11,100000.0,80125
1,4633400,240,0.06,0.0,278005
2,42900,90,0.08,8900.0,2723
3,983000,16,0.14,0.0,156902
4,15230,48,0.07,4300.0,797


In [153]:
path3 = './data_folder/loan_3.txt'
write_csv_columnar(path3)

ParserError: Error tokenizing data. C error: Expected 4 fields in line 11, saw 7


In [125]:
os.listdir('./data_folder')

['loan_3.txt', 'loan_2.txt', 'loan_1.txt']

In [117]:
len(os.listdir('./data_folder'))

3

In [127]:
pd.read_csv('./data_folder/loan_1.txt')

Unnamed: 0,amount,duration,rate,down_payment
0,100000,36,0.08,20000.0
1,200000,12,0.1,
2,628400,120,0.12,100000.0
3,4637400,240,0.06,
4,42900,90,0.07,8900.0
5,916000,16,0.13,
6,45230,48,0.08,4300.0
7,991360,99,0.08,
8,423000,27,0.09,47200.0


In [128]:
pd.read_csv('./data_folder/loan_2.txt')

Unnamed: 0,amount,duration,rate,down_payment
0,828400,120,0.11,100000.0
1,4633400,240,0.06,
2,42900,90,0.08,8900.0
3,983000,16,0.14,
4,15230,48,0.07,4300.0


In [129]:
pd.read_csv('./data_folder/loan_3.txt')

ParserError: Error tokenizing data. C error: Expected 4 fields in line 11, saw 7


In [132]:
with open('./data_folder/loan_3.txt') as f:
    print(f.read())

amount,duration,rate,down_payment
45230,48,0.07,4300
883000,16,0.14,
100000,12,0.1,
728400,120,0.12,100000
3637400,240,0.06,
82900,90,0.07,8900
316000,16,0.13,
15230,48,0.08,4300
991360,99,0.08,
323000,27,0.09,4720010000,36,0.08,20000
528400,120,0.11,100000
8633400,240,0.06,
12900,90,0.08,8900


In [167]:
for i in range(len(os.listdir('./data_folder'))-1):
    write_csv_columnar('./data_folder/loan_{}.txt'.format(i+1))

In [168]:
for i in range(len(os.listdir('./data_folder'))-1):
    pd.read_csv('./data_folder/loan_{}.txt'.format(i+1))

In [171]:
pd.read_csv('./data_folder/loan_1.txt')

Unnamed: 0,amount,duration,rate,down_payment,emi
0,100000,36,0.08,20000.0,6828
1,200000,12,0.1,0.0,29353
2,628400,120,0.12,100000.0,63409
3,4637400,240,0.06,0.0,278245
4,42900,90,0.07,8900.0,2386
5,916000,16,0.13,0.0,138707
6,45230,48,0.08,4300.0,3358
7,991360,99,0.08,0.0,79348
8,423000,27,0.09,47200.0,37481


In [172]:
pd.read_csv('./data_folder/loan_2.txt')

Unnamed: 0,amount,duration,rate,down_payment,emi
0,828400,120,0.11,100000.0,80125
1,4633400,240,0.06,0.0,278005
2,42900,90,0.08,8900.0,2723
3,983000,16,0.14,0.0,156902
4,15230,48,0.07,4300.0,797
