In [None]:
import os
os.chdir('/content/drive/My Drive/Colab Notebooks/ML_Roadmap/Jeff-Heatons-DL/Module 1')
!pwd

# File Handling

There are many different types of files that you must process as an AI practitioner. Some of these file types are listed here:

1. CSV files (generally have the .csv extension) hold tabular data that resembles spreadsheet data.

2. Image files (generally with the .png or .jpg extension) hold images for computer vision.

3. Text files (often have the .txt extension) hold unstructured text and are essential for natural language processing.

4. JSON (often have the .json extension) contain semi-structured textual data in a human-readable text-based format.

5. H5 (can have a wide array of extensions) contain semi-structured textual data in a human-readable text-based format. Keras and TensorFlow store neural networks as H5 files.

6. Audio Files (often have an extension such as .au or .wav) contain recorded sound.

Python programs can read CSV files with Pandas

In [None]:
#Fisher's Iris data set
import pandas as pd
df = pd.read_csv("https://data.heatonresearch.com/data/t81-558/iris.csv")

In [None]:
df.head()

Unnamed: 0,sepal_l,sepal_w,petal_l,petal_w,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


Stream a Large CSV file

This doesn't load the complete csv file. Pandas will read the entire CSV file into memory. Usually, this is fine. However, at times you may wish to
"stream" a huge file. Streaming allows you to process this file one record at a time. The following code loads the Iris
dataset and calculates averages, one row at a time. This technique would work for large files

In [None]:
import csv
import urllib.request
import codecs
import numpy as np

url = "https://data.heatonresearch.com/data/t81-558/iris.csv"
urlstream = urllib.request.urlopen(url)
csvfile = csv.reader(codecs.iterdecode(urlstream, 'utf-8'))
next(csvfile) # Skip header row
sum = np.zeros(4)
count = 0

for line in csvfile:
    # Convert each row to Numpy array
    line2 = np.array(line)[0:4].astype(float)
    
    # If the line is of the right length (skip empty lines), then add
    if len(line2) == 4:
        sum += line2
        count += 1
        
# Calculate the average, and print the average of the 4 iris 
# measurements (features)
print(sum/count)

[5.84333333 3.05733333 3.758      1.19933333]


## Reading an image:
Computer vision is one of the areas that neural networks outshine other models. To support computer vision,
the Python programmer needs to understand how to process images. For this course, we will use the Python
PIL package for image processing. The following code demonstrates how to load an image from a URL and
display it.

In [None]:
%matplotlib inline
from PIL import Image
import requests
from io import BytesIO

url = "https://upload.wikimedia.org/wikipedia/commons/9/92/Brookings.jpg"

response = requests.get(url)
img = Image.open(BytesIO(response.content))

img

Output hidden; open in https://colab.research.google.com to view.

Read a Text File

In [None]:
import urllib.request

url = "https://data.heatonresearch.com/data/t81-558/datasets/sonnet_18.txt"
with urllib.request.urlopen(url) as urlstream:
    for line in codecs.iterdecode(urlstream, 'utf-8'):
        print(line.rstrip())

Sonnet 18 original text
William Shakespeare

Shall I compare thee to a summer's day?
Thou art more lovely and more temperate:
Rough winds do shake the darling buds of May,
And summer's lease hath all too short a date:
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimm'd;
And every fair from fair sometime declines,
By chance or nature's changing course untrimm'd;
But thy eternal summer shall not fade
Nor lose possession of that fair thou owest;
Nor shall Death brag thou wander'st in his shade,
When in eternal lines to time thou growest:
So long as men can breathe or eyes can see,
So long lives this and this gives life to thee.
