![*INTERTECHNICA - SOLON EDUCATIONAL PROGRAMS - TECHNOLOGY LINE*](https://solon.intertechnica.com/assets/IntertechnicaSolonEducationalPrograms-TechnologyLine.png)

# Data Manipulation with Python - The NumPy Library - Array Creation

*Basic initialization of the workspace.*

In [1]:
!python -m pip install numpy
import numpy as np
print ("NumPy installed at version: {}".format(np.__version__))

NumPy installed at version: 1.21.5


## 1.1 Loading data for further processing

We will load the CSV data from a data stream so that it can be processed further: 

In [2]:
# import packages for remote data load
import requests
import io

# read data remotely
data_url = "https://raw.githubusercontent.com/INTERTECHNICA-BUSINESS-SOLUTIONS-SRL/CourseDataManipulationWithPython/main/Module%202%20-%20The%20Numpy%20Library/Session%202%20-%20NumPy%20Basics/data/country_happines_rank_2020.csv"
response = requests.get(data_url)

# load the string data into a record array
loaded_data = np.loadtxt(
    io.StringIO(response.text), 
    skiprows = 1, 
    delimiter = ",", 
    dtype = {"names" : ("Country", "Rank", "Score", "Population"),
            "formats": ("U20", "int8", "float16", "float32")}
)

print(" A sample of the CSV loaded data is: \n {}".format(loaded_data[0:10]))

 A sample of the CSV loaded data is: 
 [('Finland',  1, 7.77 ,  5540.72 ) ('Denmark',  2, 7.6  ,  5792.202)
 ('Norway',  3, 7.555,  5421.241) ('Iceland',  4, 7.492,   341.243)
 ('Netherlands',  5, 7.49 , 17134.871)
 ('Switzerland',  6, 7.48 ,  8654.622) ('Sweden',  7, 7.344, 10099.265)
 ('New Zealand',  8, 7.31 ,  4822.233) ('Canada',  9, 7.277, 37742.152)
 ('Austria', 10, 7.246,  9006.398)]


## 1.2 Processing the loaded data 

We will process the data, adding a percentual difference relative to the previous country in the ranking:

In [3]:
# iterate over the elements in the array and calculate the percentual difference
# based on 
loaded_data_size = loaded_data.shape[0]
loaded_data_scores = loaded_data["Score"]
percentual_difference_data = [100 * (loaded_data_scores[i] - loaded_data_scores[i+1]) / loaded_data_scores[i] \
                              for i in range(0, loaded_data_size -1 ) ]
percentual_increase_data = np.append(
    percentual_difference_data,
    0
)

# create a record array having the procentual difference values
percentual_difference = np.array(
  percentual_difference_data,
  dtype=[("Percentual Increase", "<f16")]  
)

In [4]:
from numpy.lib import recfunctions as rfn

# merge the original array with the percentual differences
merged_array = rfn.merge_arrays (
  (loaded_data, percentual_difference),
  asrecarray=True, 
  flatten=True
)

print(
    " A sample of the merged data is: \n{}\n with the data structure \n{}".format(
        merged_array[0:10],
        merged_array.dtype
    )
  )

 A sample of the merged data is: 
[('Finland',  1, 7.77 ,  5540.72 , 2.1618904 )
 ('Denmark',  2, 7.6  ,  5792.202, 0.61664954)
 ('Norway',  3, 7.555,  5421.241, 0.82730093)
 ('Iceland',  4, 7.492,   341.243, 0.05213764)
 ('Netherlands',  5, 7.49 , 17134.871, 0.10432968)
 ('Switzerland',  6, 7.48 ,  8654.622, 1.82767624)
 ('Sweden',  7, 7.344, 10099.265, 0.4787234 )
 ('New Zealand',  8, 7.31 ,  4822.233, 0.42757883)
 ('Canada',  9, 7.277, 37742.152, 0.42941492)
 ('Austria', 10, 7.246,  9006.398, 0.26954178)]
 with the data structure 
(numpy.record, [('Country', '<U20'), ('Rank', 'i1'), ('Score', '<f2'), ('Population', '<f4'), ('Percentual Increase', '<f16')])


## 1.3 Saving the loaded data 

Once the data has been loaded and processed, the next natural step would be to store it for further processing. The NumPy library supports the saving of data in CSV format via the [**savetxt**](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html) function.

The function's parameters allow extensive customization, such as the file (or data stream) for storing the data, the format used for saving it or the characters used for delimiters and new line.  

In [5]:
# save the data in a CSV format
np.savetxt(
    # the filename
    "./country_happines_rank_2020_processed.csv",
    # the data
    merged_array,
    # the CSV data
    header = ",".join(merged_array.dtype.names),
    # the format for saved data
    fmt = ["%s", "%d", "%f", "%f", "%f"],
    
    # the delimiter character
    delimiter = ","
)

Let's display a sample of the content of the file as well:

In [7]:
# open the file
csv_file = open("./country_happines_rank_2020_processed.csv")

# display a content sample
content = csv_file.readlines()
print(
    "A sample of the "
)

content[0:10]

['# Country,Rank,Score,Population,Percentual Increase\n',
 'Finland,1,7.769531,5540.720215,2.161890\n',
 'Denmark,2,7.601562,5792.202148,0.616650\n',
 'Norway,3,7.554688,5421.241211,0.827301\n',
 'Iceland,4,7.492188,341.243011,0.052138\n',
 'Netherlands,5,7.488281,17134.871094,0.104330\n',
 'Switzerland,6,7.480469,8654.622070,1.827676\n',
 'Sweden,7,7.343750,10099.264648,0.478723\n',
 'New Zealand,8,7.308594,4822.232910,0.427579\n',
 'Canada,9,7.277344,37742.152344,0.429415\n']