# Set up 

- this notebook needs the `Machine_Learning_AI.txt` file to work with

### If you're running Google Collab 
- if you're running google collabs,  
  - load the file into your google drive, locate the file in google drive via google collab, then copy the path and paste the path in the variable called `file_path` in the cell below 

### If you're running a Local Jupyter Notebook 
- if you're running a Jupyter Notebooks locally from Anaconda Navigator or VS Code, downlaod the file and save it locally in the same folder as the notebook 
  - then get the path for the file and paste it in code cell below 

In [1]:
file_path = "Machine_Learning_AI.txt"

# Reading Files

- use the built-in `open` function
  - a file object has to be created first
  - then obtain data from file

- we are using a text file in our example here



### `open` and `close` methods

In [2]:
# code to open file:
our_text_file = open(file_path,'r')

- here:
  - `our_text_file`: new file object
  - `open`: keyword to open a file
  - `file_path`: variable containing the file path string 
    - this can be the actual file path string as well
  - `'r'`: the mode of opening the file
    - popular modes of file reading:
      - `'r'`: reading mode
      - `'w'`: writing mode
      - `'a'`: appending mode

In [3]:
# gather info about the file we opened i nthe previous code cell
print(our_text_file.name)
print(our_text_file.mode)

Machine_Learning_AI.txt
r


In [4]:
# close the open file 
our_text_file.close()

### `with` keyword 

- everytime a file is opened, it has to be closed; else it occupies mememory unnecessarily and in some cases turns into a security flaw
  - opening and closing the file gets tedious 
- so the `with` keyword is used as it automatically opens and closes the file 
  - using the `with` keyword is the best practise for working with files 
  - the `with` keyword opens the file in the file path, runs everything in the indent code block and closes the file automatically
  - the file content will not be available outside the code block, however, but the file content can be saved into variables for later use

In [5]:
# code pattern for opening a file using `with` keyword
with open(file_path,"r") as active_file:
  file_content = active_file.read()
  print(file_content)

# here: `active_file` is the file object 
# `open` is the file opening keyword 
# '.read()` reads the content of the text file


Unit 1: Introduction To Python - 

Intro to the Python Programming language
Control structures and functions
Importing libraries - Numpy and Pandas Basics
Data wrangling and cleaning
Exploratory Data Analysis and descriptive statistics

Unit 2: Database Management - 

Introduction to Databases and SQL
Retrieving data with SQL
Writing SQL Queries
Functions and aggregations
Joins in SQL

Unit 3: Statistics and Data Visualisation - 

Data Visualisation using matplotlib/seaborn
Plotting Data distributions
Introduction to probability
Discrete and Continuous probability distributions
Inferential Statistics and Hypothesis testing

Unit 4: Machine Learning:

Introduction to machine learning models 
Simple and Multiple Regression Models
Logistic Regression
ROC and AUC analysis
KNN and Naive Bayes classifier
SVM Classification and Regression
Tree Based models:
- Decision trees
- Truncation and pruning
- Random Forests and Bagged models
- Gradient Boosted models
Feature selection methods 
Model e

- so `with` was used to read the file content from the file `"/content/drive/My Drive/Colab Notebooks/Machine_Learning_AI.txt"` which is saved google drive in this case 

In [6]:
# check if file is closed 
print(active_file.closed)

True


In [20]:
# check if content is saved for later use
print(file_content)

Unit 1: Introduction To Python - 

Intro to the Python Programming language
Control structures and functions
Importing libraries - Numpy and Pandas Basics
Data wrangling and cleaning
Exploratory Data Analysis and descriptive statistics

Unit 2: Database Management - 

Introduction to Databases and SQL
Retrieving data with SQL
Writing SQL Queries
Functions and aggregations
Joins in SQL

Unit 3: Statistics and Data Visualisation - 

Data Visualisation using matplotlib/seaborn
Plotting Data distributions
Introduction to probability
Discrete and Continuous probability distributions
Inferential Statistics and Hypothesis testing

Unit 4: Machine Learning:

Introduction to machine learning models 
Simple and Multiple Regression Models
Logistic Regression
ROC and AUC analysis
KNN and Naive Bayes classifier
SVM Classification and Regression
Tree Based models:
- Decision trees
- Truncation and pruning
- Random Forests and Bagged models
- Gradient Boosted models
Feature selection methods 
Model e

In [36]:
# check the length of the file content (number of characters of the file)
print(len(file_content))

1517


### reading file content: `.readlines()`

- the above output is just a huge block of text 
- it needs to be more manageable 
  - so the file content is read and converted to a list while using the `with` operation
- the `readlines()` method reads lines one by one 
  - this can be stored into a list

In [9]:
# open file using `with` keyword
with open(file_path,"r") as active_file:
  
  # convert file content to a list
  file_content_list = active_file.readlines()

  # output single elements from the list 
  # (they correspond to lines of the text file)
  print(file_content_list[0]) # prints the 1st line
  print(file_content_list[10]) # prints the 11th line
  print(file_content_list[54]) # prints the 55th line 


Unit 1: Introduction To Python - 

Introduction to Databases and SQL

Reinforcement Learning Models



- next, we check all the lines read by the `with` code block 
  - also get the number of lines in the file

In [10]:
# open file using `with` keyword
with open(file_path,"r") as active_file:
  
  # convert file content to a list using `.readlines()`
  file_content_list = active_file.readlines()

  # print all file lines as a list 
  print(file_content_list)

  # get the number of lines in the file 
  print(len(file_content_list))

['Unit 1: Introduction To Python - \n', '\n', 'Intro to the Python Programming language\n', 'Control structures and functions\n', 'Importing libraries - Numpy and Pandas Basics\n', 'Data wrangling and cleaning\n', 'Exploratory Data Analysis and descriptive statistics\n', '\n', 'Unit 2: Database Management - \n', '\n', 'Introduction to Databases and SQL\n', 'Retrieving data with SQL\n', 'Writing SQL Queries\n', 'Functions and aggregations\n', 'Joins in SQL\n', '\n', 'Unit 3: Statistics and Data Visualisation - \n', '\n', 'Data Visualisation using matplotlib/seaborn\n', 'Plotting Data distributions\n', 'Introduction to probability\n', 'Discrete and Continuous probability distributions\n', 'Inferential Statistics and Hypothesis testing\n', '\n', 'Unit 4: Machine Learning:\n', '\n', 'Introduction to machine learning models \n', 'Simple and Multiple Regression Models\n', 'Logistic Regression\n', 'ROC and AUC analysis\n', 'KNN and Naive Bayes classifier\n', 'SVM Classification and Regression

- after saving the file lines to a list, it can be looped over with a `for` loop


In [11]:
# setup for loop to scroll through all the lines in the file content list
for line in file_content_list:
  print(len(line)) # print the length of each line

34
1
41
33
46
28
53
1
31
1
34
25
20
27
13
1
45
1
44
28
28
50
46
1
26
1
41
38
20
21
31
34
19
17
25
35
26
27
45
23
40
31
20
1
34
1
25
25
21
30
1
13
25
63
30
1
7
1
86
1


# Writing Files

- use `with` `open` again to automatically open and close files
  - `"w"`: mode to write to file 
  - `"a"`: mode to append to file
  - `"x"`: mode to create a new file and open it for writing

- use the `.write()` to write lines to the file inside the `with` block

In [12]:
# code pattern to create a new file and write to it 
with open('new_file.txt','x') as file_to_write:
  file_to_write.write('this is line 0\n')
  file_to_write.write('this is line 1\n')

- check the newly created file in your file browser
- if code throws a `"FileExistsError: ... File exists: 'new_file.txt'"` error , this means a file already exists with the same name
  - so change the mode to `'w'` to write to the exisitng file, or change the file name to create a new file 

- `for` loops can be used to write some list's content to a file within the `with` block

# Additional Learning 

- research more about file object methods 
  - see help for syntax
  - i.e. help(file) 

In [40]:
help(open)

Help on built-in function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise IOError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position