# Interacting with the OS and filesystem
The os module in Python provides many functions for interacting with the OS and the filesystem. Let's import it and try out some examples.

In [1]:
import os

We can check the present working directory using the os.getcwd function.

In [2]:
os.getcwd()

'c:\\Users\\HP\\Desktop\\data Analysis'

To get the list of files in a directory, use os.listdir. You pass an absolute or relative path of a directory as the argument to the function.

In [3]:
help(os.listdir)

Help on built-in function listdir in module nt:

listdir(path=None)
    Return a list containing the names of the files in the directory.
    
    path can be specified as either str, bytes, or a path-like object.  If path is bytes,
      the filenames returned will also be bytes; in all other circumstances
      the filenames returned will be str.
    If path is None, uses the path='.'.
    On some platforms, path may also be specified as an open file descriptor;\
      the file descriptor must refer to a directory.
      If this functionality is unavailable, using it raises NotImplementedError.
    
    The list is in arbitrary order.  It does not include the special
    entries '.' and '..' even if they are present in the directory.



In [4]:
os.listdir()

['.ipynb_checkpoints',
 'climate.txt',
 'climate_result.txt',
 'data',
 'Numpylibrary.ipynb',
 'PythonwithOS.ipynb']

In [5]:
os.listdir('.') #relative path

['.ipynb_checkpoints',
 'climate.txt',
 'climate_result.txt',
 'data',
 'Numpylibrary.ipynb',
 'PythonwithOS.ipynb']

In [6]:
os.listdir('/users') # absoulate path

['All Users', 'Default', 'Default User', 'desktop.ini', 'HP', 'Public']

You can create a new directory using os.makedirs. Let's create a new directory called data, where we'll later download some files

In [7]:
os.makedirs('./data', exist_ok=True)

In [8]:
os.makedirs('./data', exist_ok=False)

FileExistsError: [WinError 183] Cannot create a file when that file already exists: './data'

In [9]:
'data' in os.listdir('.')

True

In [10]:
os.listdir('./data')

['loans1.txt', 'loans2.txt', 'loans3.txt']

Let us download some files into the data directory using the urllib module.

In [11]:
url1 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans1.txt'
url2 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans2.txt'
url3 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans3.txt'

In [12]:
from urllib.request import urlretrieve

In [13]:
urlretrieve(url1, './data/loans1.txt')

('./data/loans1.txt', <http.client.HTTPMessage at 0x266225455a0>)

In [14]:
urlretrieve(url2, './data/loans2.txt')

('./data/loans2.txt', <http.client.HTTPMessage at 0x26622545570>)

In [15]:
urlretrieve(url3, './data/loans3.txt')

('./data/loans3.txt', <http.client.HTTPMessage at 0x26622544b20>)

Let's verify that the files were downloaded.

In [16]:
os.listdir('./data')

['loans1.txt', 'loans2.txt', 'loans3.txt']

In [17]:
help(urlretrieve)

Help on function urlretrieve in module urllib.request:

urlretrieve(url, filename=None, reporthook=None, data=None)
    Retrieve a URL into a temporary location on disk.
    
    Requires a URL argument. If a filename is passed, it is used as
    the temporary file location. The reporthook argument should be
    a callable that accepts a block number, a read size, and the
    total file size of the URL target. The data argument should be
    valid URL encoded data.
    
    If a filename is passed and the URL points to a local resource,
    the result is a copy from local file to new file.
    
    Returns a tuple containing the path to the newly created
    data file as well as the resulting HTTPMessage object.



# Reading from a file
To read the contents of a file, we first need to <span background-color="gray">open</span> the file using the built-in open function. The open function returns a file object and provides several methods for interacting with the file's contents.

In [18]:
file1 = open('./data/loans1.txt', mode='r')

The open function also accepts a mode argument to specifies how we can interact with the file. The following options are supported:

    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================
To view the contents of the file, we can use the read method of the file object.

In [19]:
file1_content = file1.read()

In [20]:
print(file1_content)

amount,duration,rate,down_payment
100000,36,0.08,20000
200000,12,0.1,
628400,120,0.12,100000
4637400,240,0.06,
42900,90,0.07,8900
916000,16,0.13,
45230,48,0.08,4300
991360,99,0.08,
423000,27,0.09,47200


The file contains information about loans. It is a set of comma-separated values (CSV).

**CSVs**: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. (Wikipedia)

The first line of the file is the header, indicating what each of the numbers on the remaining lines represents. Each of the remaining lines provides information about a loan. Thus, the second line 100000,36,0.08,20000 represents a loan with:

* an amount of $100000,
* duration of 36 months,
* rate of interest of 8% per annum, and
* a down payment of $20000

The CSV is a standard file format used for sharing data for analysis and visualization. Over the course of this tutorial, we will read the data from these CSV files, process it, and write the results back to files. Before we continue, let's close the file using the <span style="background-color: black">close</span> method (otherwise, Python will continue to hold the entire file in the RAM)

In [21]:
file1.close()

Once a file is closed, you can no longer read from it.

In [22]:
file1.read()

ValueError: I/O operation on closed file.

### Closing files automatically using <span style="background-color:gray"> with </span>
To close a file automatically after you've processed it, you can open it using the with statement.

In [23]:
with open('./data/loans2.txt', 'r') as file2:
    file2_content = file2.read()
    print(file2_content)

amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300


In [24]:
file2.read()

ValueError: I/O operation on closed file.

In [27]:
with open('./data/loans3.txt','r') as file3:
    file3_lines = file3.readlines()

In [28]:
file3_lines

['amount,duration,rate,down_payment\n',
 '45230,48,0.07,4300\n',
 '883000,16,0.14,\n',
 '100000,12,0.1,\n',
 '728400,120,0.12,100000\n',
 '3637400,240,0.06,\n',
 '82900,90,0.07,8900\n',
 '316000,16,0.13,\n',
 '15230,48,0.08,4300\n',
 '991360,99,0.08,\n',
 '323000,27,0.09,4720010000,36,0.08,20000\n',
 '528400,120,0.11,100000\n',
 '8633400,240,0.06,\n',
 '12900,90,0.08,8900']

In [30]:
file3_lines[0].strip()

'amount,duration,rate,down_payment'

In [31]:
type(file3_lines)

list

In [33]:
for i in range(0,len(file3_lines)):
    print(file3_lines[i].strip()) #this is bad approach because you already have read function which give you data without \n

amount,duration,rate,down_payment
45230,48,0.07,4300
883000,16,0.14,
100000,12,0.1,
728400,120,0.12,100000
3637400,240,0.06,
82900,90,0.07,8900
316000,16,0.13,
15230,48,0.08,4300
991360,99,0.08,
323000,27,0.09,4720010000,36,0.08,20000
528400,120,0.11,100000
8633400,240,0.06,
12900,90,0.08,8900


### Processing data from files
Before performing any operations on the data stored in a file, we need to convert the file's contents from one large string into Python data types. For the file <span style="background-color:black">loans1.txt</span> containing information about loans in a CSV format, we can do the following:

* Read the file line by line
* Parse the first line to get a list of the column names or headers
* Split each remaining line and convert each value into a float
* Create a dictionary for each loan using the headers as keys
* Create a list of dictionaries to keep track of all the loans

Since we will perform the same operations for multiple files, it would be useful to define a function <span style="background-color:black">read_csv.</span> We'll also define some helper functions to build up the functionality step by step.

Let's start by defining a function <span style="background-color:black">parse_header</span> that takes a line as input and returns a list of column headers

In [34]:
print(file2_content)

amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300


In [35]:
'828400,120,0.11,100000'.split(',')

['828400', '120', '0.11', '100000']

In [36]:
loan1= {
    'amount':828400,
    'duration':120,
    'rate':0.11,
    'down_payment':100000
}

In [37]:
import numpy as np

In [43]:
lons = np.array([{'amount':828400,'duration':120,'rate':0.11,'down_payment':100000},
        {'amount':4633400,'duration':240,'rate':0.6,'down_payment':" "},
        {'amount':42900,'duration':90,'rate':0.08,'down_payment':8900}])

In [44]:
lons

array([{'amount': 828400, 'duration': 120, 'rate': 0.11, 'down_payment': 100000},
       {'amount': 4633400, 'duration': 240, 'rate': 0.6, 'down_payment': ' '},
       {'amount': 42900, 'duration': 90, 'rate': 0.08, 'down_payment': 8900}],
      dtype=object)

In [45]:
type(lons)

numpy.ndarray

In [46]:
def parse_headers(header_line):
    return header_line.strip().split(',')

The <span style="background-color:black">strip</span> method removes any extra spaces and the newline character <span style="background-color:black">\n.</span> The <span style="background-color:black">split </span>method breaks a string into a list using the given separator (<span style="background-color:black">, </span> in this case).

In [47]:
file3_lines[0]

'amount,duration,rate,down_payment\n'

In [48]:
header = parse_headers(file3_lines[0])

In [49]:
header

['amount', 'duration', 'rate', 'down_payment']

Next, let's define a function <span style="background-color:black">parse_values</span> that takes a line containing some data and returns a list of floating-point numbers.

In [53]:
def parse_values(data_lines):
    values = []
    for item in data_lines.strip().split(','):
        values.append(float(item))
    return values    

In [54]:
file3_lines[1]

'45230,48,0.07,4300\n'

In [58]:
file3_lines[1].strip().split(',')

['45230', '48', '0.07', '4300']

In [59]:
parse_values(file3_lines[1])

[45230.0, 48.0, 0.07, 4300.0]

The values were parsed and converted to floating point numbers, as expected. Let's try it for another line from the file, which does not contain a value for the down payment.

In [60]:
file3_lines[2]

'883000,16,0.14,\n'

In [62]:
file3_lines[2].strip().split(',')

['883000', '16', '0.14', '']

In [63]:
float('')

ValueError: could not convert string to float: ''

In [61]:
parse_values(file3_lines[2])

ValueError: could not convert string to float: ''

The code above leads to a <span style="background-color:black">ValueError</span> because the empty string <span style="background-color:black">''</span> cannot be converted to a float. We can enhance the <span style="background-color:black">parse_values</span> function to handle this edge case. We will also handle the case where the value is not a float.

In [67]:
def parse_values(data_lines):
    value= []
    for item in data_lines.strip().split(','):
        if item == '':
            value.append(0.0)
        else :
            try:
                value.append(float(item))
            except ValueError:
                value.append(item)  
    return value              

In [68]:
file3_lines[2]

'883000,16,0.14,\n'

In [69]:
parse_values(file3_lines[2])

[883000.0, 16.0, 0.14, 0.0]

Next, let's define a function <span style="background-color:black">create_item_dict</span> that takes a list of values and a list of headers as inputs and returns a dictionary with the values associated with their respective headers as keys.

In [75]:
def create_item_dict(headers,values):
    result = {}
    for header,value in zip(headers,values):
        result[header] = value
    return result    

In [76]:
header

['amount', 'duration', 'rate', 'down_payment']

In [77]:
values1 = parse_values(file3_lines[1])

In [78]:
values1

[45230.0, 48.0, 0.07, 4300.0]

In [79]:
create_item_dict(header,values1)

{'amount': 45230.0, 'duration': 48.0, 'rate': 0.07, 'down_payment': 4300.0}

As expected, the values & header are combined to create a dictionary with the appropriate key-value pairs.

We are now ready to put it all together and define the <span style="background-color:black">read_csv</span> function.

In [80]:
def read_csv(path):
    result=[]
    # Open the file in read mode
    with open(path,'r') as f:
        # Get a list of lines
        lines = f.readlines()
        # Parse the header
        header = parse_headers(lines[0])
        # Loop over the remaining lines
        for data_lines in lines[1:]:
            # Parse the values
            values = parse_values(data_lines)
            # Create a dictionary using values & headers
            item_dict = create_item_dict(header,values)
            # Add the dictionary to the result
            result.append(item_dict)
    return result        

In [81]:
with open('./data/loans2.txt') as file2:
    print(file2.read())

amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300


In [82]:
read_csv('./data/loans2.txt')

[{'amount': 828400.0,
  'duration': 120.0,
  'rate': 0.11,
  'down_payment': 100000.0},
 {'amount': 4633400.0, 'duration': 240.0, 'rate': 0.06, 'down_payment': 0.0},
 {'amount': 42900.0, 'duration': 90.0, 'rate': 0.08, 'down_payment': 8900.0},
 {'amount': 983000.0, 'duration': 16.0, 'rate': 0.14, 'down_payment': 0.0},
 {'amount': 15230.0, 'duration': 48.0, 'rate': 0.07, 'down_payment': 4300.0}]