---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 3.7</h1>

# _07-IO Operations.ipynb_

# Learning agenda of this notebook
1. Reading Data from text/csv Files
2. Writing data to files
3. A Sample Project

In [1]:
# To install this library in Jupyter notebook
#import sys
#!{sys.executable} -m pip install numpy

In [2]:
import numpy as np
np.__version__ , np.__path__

('1.19.5',
 ['/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy'])

## 1. Reading Data from Files
The ```numpy.loadtxt()```  and ```numpy.genfromtxt``` are both used to load data from a text file. There’s no major difference between the two; the only one that stands out is np.genfromtxt’s ability to smoothly handle missing values.
```
numpy.loadtxt(fname, dtype=’float’, delimiter=None, skiprows=0, comments='#')
numpy.genfromtxt(fname, dtype=’float’, delimiter=None, skip_header=0, comments='#',  missing_values=None, filling_values=None)
```

   - **fname:** filename, or generator to read. The extension can be .txt or anything, however, the file contents must be txt. If the filename extension is .gz or .bz2, the file is first decompressed.
   - **dtype:** Data-type of the resulting array; default: float.
   - **delimiter:** A delimiter is a character or a string of characters that separates individual values on a line. Default delimiter is space, however, it can be ",", or "\t".
   - **skiprows:** Skip the first skiprows lines; default: 0.
   -  **comments:** The characters or list of characters used to indicate the start of a comment. None implies no comments. For backwards compatibility, byte strings will be decoded as 'latin1'. The default is '#'.

In [3]:
#np.genfromtxt()

In [4]:
# Example 1: 
import numpy as np
arr = np.genfromtxt("datasets/data.txt")
print(arr)

[[12. 19. 62.]
 [ 9. 33. 61.]]


In [5]:
# Example 2: 
import numpy as np
arr = np.genfromtxt("datasets/data.txt", dtype=int)
print(arr)

[[12 19 62]
 [ 9 33 61]]


In [6]:
# Example 3: 
import numpy as np
from io import StringIO   

# StringIO behaves like a file object
data = StringIO("0, 1, 2 \n 3, 4, 5")
arr1 = np.genfromtxt(data, dtype=int, delimiter=",")
print(arr1)

[[0 1 2]
 [3 4 5]]


In [7]:
# Example 4: 
import numpy as np
from io import StringIO   

# StringIO behaves like a file object
data = StringIO("Temp Humidity Rainfall \n 34, 12, 5 \n 36, 14, 7 \n 30, 18, 6 \n 39, 11, 4")
arr1 = np.genfromtxt(data, dtype=int, delimiter=",", skip_header=1)
print(arr1)

[[34 12  5]
 [36 14  7]
 [30 18  6]
 [39 11  4]]


**CSVs**: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. 

In [8]:
# Example 5:
import numpy as np
 
data = np.genfromtxt("datasets/icecreamsales.csv", skip_header=1, usecols=[0, 1], delimiter=',')
 
print("Ice Cream Sales data: \n", data)

# Sum of columns
print("\nSum of columns:", np.sum(data,axis=0))
print("Mean of columns: ", np.mean(data, axis=0))
print("Median of columns: ", np.median(data, axis=0))


Ice Cream Sales data: 
 [[ 37. 292.]
 [ 40. 228.]
 [ 49. 324.]
 [ 61. 376.]
 [ 72. 440.]
 [ 79. 496.]
 [ 83. 536.]
 [ 81. 556.]
 [ 75. 496.]
 [ 64. 412.]
 [ 53. 324.]
 [ 40. 320.]]

Sum of columns: [ 734. 4800.]
Mean of columns:  [ 61.16666667 400.        ]
Median of columns:  [ 62.5 394. ]


## 2. Writing Data into Files
The ```numpy.savetxt()``` is used to save a NumPy array to a text file
```
np.savetxt(fname, arr, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None)
```

    - **fname:** If the filename ends in ``.gz``, the file is automatically saved in compressed gzip format.
    - **arr:** 1-D or 2-D array to be saved to a text file-type of the resulting array; default: float.
    - **fmt:** str or sequence of strs, optional
    - **delimiter:** String or character separating columns.
    - **newline:** String or character separating lines.
    - **header:** A String that will be written at the beginning of the file.
    - **footer:** A String that will be written at the end of the file.
    - **comments:** A string that will be prepended to the ``header`` and ``footer`` strings,to mark them comments.
   
- The **np.save()** saves an array to a binary file in NumPy .npy format
- The **np.savez()** saves several arrays into an uncompressed .npz archive
- The **np.savez_compressed()** save several arrays into a compressed .npz archive

In [9]:
#Example 1: Creating a NumPy array and then Saving it as a text file and then reading from that text file
arr1 = np.array([[1, 2, 3], [4, 5, 6],[7, 8, 9]])
np.savetxt('datasets/myarr.txt', arr1)
arr2 = np.genfromtxt("datasets/myarr.txt")
arr2

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

In [10]:
#Example 2: Creating a NumPy array and then Saving it as a csv file and then reading from that csv file
arr1 = np.array([[1, 2, 3], [4, 5, 6],[7, 8, 9]])
np.savetxt('datasets/myarr.csv', arr1, delimiter=',')
arr2 = np.genfromtxt("datasets/myarr.csv", usecols=[0, 1], delimiter=',')
arr2

array([[1., 2.],
       [4., 5.],
       [7., 8.]])

In [11]:
# Example 3: Creating a NumPy array using StringIO(), then saving and then reading from a csv file
import numpy as np
from io import StringIO   

# Generate StringIO behaves like a file object, 
arr1 = StringIO("Temp Humidity Rainfall \n 34, 12, 5 \n 36, 14, 7 \n 30, 18, 6 \n 39, 11, 4")
arr2 = np.genfromtxt(arr1, dtype=int, delimiter=",", skip_header=1)

np.savetxt('datasets/weather.csv', arr2, delimiter=',')
arr3 = np.genfromtxt("datasets/weather.csv", usecols=[0, 1, 2], delimiter=',')
arr3

array([[34., 12.,  5.],
       [36., 14.,  7.],
       [30., 18.,  6.],
       [39., 11.,  4.]])

## 3.  A Sampe Project

Let's download a file `climate.txt`, which contains 10,000 climate measurements (temperature, rainfall & humidity) in the following format:

```
temperature,rainfall,humidity
25.00,76.00,99.00
39.00,65.00,70.00
59.00,45.00,77.00
84.00,63.00,38.00
66.00,50.00,52.00
41.00,94.00,77.00
91.00,57.00,96.00
49.00,96.00,99.00
67.00,20.00,28.00
...
```

In [12]:
import urllib.request
import numpy as np

# Uncomment followin two lines if you get a URLError
#URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] ... unable to get local issuer certificate>
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

#Download climate.csv file from the following link
urllib.request.urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/08/climate.csv', 'climate.csv')
#climate_data = np.genfromtxt('climate.txt', delimiter=',', skip_header=1)
#Read the file into a NumPy Array
climate_data = np.loadtxt('climate.csv', delimiter=',', skiprows=1)


In [13]:
climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [14]:
climate_data.shape

(10000, 3)

We can now perform a matrix multiplication using the `@` operator to predict the yield of apples for the entire dataset using a given set of weights.

In [15]:
weights = np.array([0.3, 0.2, 0.5])

In [16]:
yields = climate_data @ weights
yields = np.matmul(climate_data, weights)

In [17]:
yields

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [18]:
yields.shape

(10000,)

Let's add the `yields` to `climate_data` as a fourth column using the `np.concatenate`
Since we wish to add new columns, we pass the argument `axis=1` to `np.concatenate`. The `axis` argument specifies the dimension for concatenation.

In [19]:
climate_results = np.concatenate((climate_data, yields.reshape(10000, 1)), axis=1)

In [20]:
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

Let's write the final results from our computation above back to a file using the `np.savetxt` function.

In [21]:
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

In [22]:
np.savetxt('climate_results.txt', 
           climate_results, 
           fmt='%.2f', 
           delimiter=',',
           header='temperature,rainfall,humidity,yeild_apples', 
           comments='')

The results are written back in the CSV format to the file `climate_results.txt`. 

```
temperature,rainfall,humidity,yeild_apples
25.00,76.00,99.00,72.20
39.00,65.00,70.00,59.70
59.00,45.00,77.00,65.20
84.00,63.00,38.00,56.80
...
```

