In [2]:
import numpy as np

## Working with CSV data files

Numpy also provides helper functions reading from & writing to files. Let's download a file `climate.txt`, which contains 10,000 climate measurements (temperature, rainfall & humidity) in the following format:


```
temperature,rainfall,humidity
25.00,76.00,99.00
39.00,65.00,70.00
59.00,45.00,77.00
84.00,63.00,38.00
66.00,50.00,52.00
41.00,94.00,77.00
91.00,57.00,96.00
49.00,96.00,99.00
67.00,20.00,28.00
...
```

This format of storing data is known as *comma-separated values* or CSV. 

> **CSVs**: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. (Wikipedia)


To read this file into a numpy array, we can use the `genfromtxt` function.

In [3]:
# Using the urlib we can handle as well as the data store on the external urls 
# Importing the climate data for many regions
import urllib.request
urllib.request.urlretrieve(
'https://gist.github.com/BirajCoder/a4ffcb76fd6fb221d76ac2ee2b8584e9/raw/4054f90adfd361b7aa4255e99c2e874664094cea/climate.csv', 
    'climate.txt'
)

# urlretriever takes 2 Inputs the url from where we wish to download our file and second one path to store it 

('climate.txt', <http.client.HTTPMessage at 0x244335915b0>)

#### In order to perform various tasks on the given file we have to first convert it to format which can be handled using numpy library


In [4]:
help(np.genfromtxt)

Help on function genfromtxt in module numpy:

genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=" !#$%&'()*+,-./:;<=>?@[\\]^{|}~", replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding=None, *, ndmin=0, like=None)
    Load data from a text file, with missing values handled as specified.

    Each line past the first `skip_header` lines is split at the `delimiter`
    character, and characters following the `comments` character are discarded.

    Parameters
    ----------
    fname : file, str, pathlib.Path, list of str, generator
        File, filename, list, or generator to read.  If the filename
        extension is ``.gz`` or ``.bz2``, the file is first decompressed. Note
        that generators must return bytes or

In [5]:
# There is an inbuilt function in Numpy to Directly convert Only Numeric data to Numpy array
climate_data=np.genfromtxt('climate.txt',delimiter=',',skip_header=1)
# Teh above function does the work of storaing the csv file as a text 
# and breaking by commas and skipping the header

In [17]:
climate_data
climate_data.shape

(10000, 3)

In [18]:
#  Now we can check the shape of our array 
weight=np.array([0.3,0.2,0.5])
climate_data.shape
weight.shape

(3,)

In [8]:
# Now as we have converted the climate data to numpy array 
#  We can now use the matrix multiplicaiton to get an array for final yeilds result 
_yeilds_ = climate_data @ weight

In [9]:
_yeilds_

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

Let's add the `yields` to `climate_data` as a fourth column using the [`np.concatenate`](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html) function.

In [10]:
# the concatenate funtions helps us to join the array 
# The Parameter axis take 2 inputs 1/0
# In order to Concatenate along the rows we use 0
# In order to Concatenate along the Columns we use 1

In [11]:
# Combining the Data 
yield_results = np.concatenate((climate_data,_yeilds_.reshape(10000,1)),axis=1)

In [12]:
yield_results.shape
print(yield_results)

[[25.  76.  99.  72.2]
 [39.  65.  70.  59.7]
 [59.  45.  77.  65.2]
 ...
 [99.  62.  58.  71.1]
 [70.  71.  91.  80.7]
 [92.  39.  76.  73.4]]


In [13]:
# What is the Meaning of the 'reshape' property in python in case of np.concatenate paramerts((),axis=)
help(np.reshape)

Help on _ArrayFunctionDispatcher in module numpy:

reshape(a, newshape, order='C')
    Gives a new shape to an array without changing its data.

    Parameters
    ----------
    a : array_like
        Array to be reshaped.
    newshape : int or tuple of ints
        The new shape should be compatible with the original shape. If
        an integer, then the result will be a 1-D array of that length.
        One shape dimension can be -1. In this case, the value is
        inferred from the length of the array and remaining dimensions.
    order : {'C', 'F', 'A'}, optional
        Read the elements of `a` using this index order, and place the
        elements into the reshaped array using this index order.  'C'
        means to read / write the elements using C-like index order,
        with the last axis index changing fastest, back to the first
        axis index changing slowest. 'F' means to read / write the
        elements using Fortran-like index order, with the first index
     

In [14]:
# Now we can import the file into new format adding a new header columns to it 
np.savetxt("Final_Results.txt",yield_results,fmt='%.2f',header='Temperature,Rainfall,Humidity,Yeild Results',comments='')