![*INTERTECHNICA - SOLON EDUCATIONAL PROGRAMS - TECHNOLOGY LINE*](https://solon.intertechnica.com/assets/IntertechnicaSolonEducationalPrograms-TechnologyLine.png)

# Python for Data Processing - Creating Arrays

*Numpy allows various methods for creating arrays starting from creation using basic types to loading data from external sources*.

Initializing environment for machine learning use:

In [1]:
!python -m pip install numpy



## 1. Creating from arrays and lists

An array can be created from simple Python arrays:

In [2]:
import numpy as np

In [3]:
x_1d = np.array([ 1,   2,   3,   4,   5,   6,   7,   8,   9,  10])
print(x_1d)

[ 1  2  3  4  5  6  7  8  9 10]


In [4]:
x_2d = np.array(
      [[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15,  16,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25,  26,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35,  36,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45,  46,  47,  48,  49,  50],
       [ 51,  52,  53,  54,  55,  56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65,  66,  67,  68,  69,  70],
       [ 71,  72,  73,  74,  75,  76,  77,  78,  79,  80],
       [ 81,  82,  83,  84,  85,  86,  87,  88,  89,  90],
       [ 91,  92,  93,  94,  95,  96,  97,  98,  99, 100]])

print(x_2d)

[[  1   2   3   4   5   6   7   8   9  10]
 [ 11  12  13  14  15  16  17  18  19  20]
 [ 21  22  23  24  25  26  27  28  29  30]
 [ 31  32  33  34  35  36  37  38  39  40]
 [ 41  42  43  44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60]
 [ 61  62  63  64  65  66  67  68  69  70]
 [ 71  72  73  74  75  76  77  78  79  80]
 [ 81  82  83  84  85  86  87  88  89  90]
 [ 91  92  93  94  95  96  97  98  99 100]]


Numpy also allows creation of arrays from lists:

In [5]:
x_1d_list = np.array(( 1,   2,   3,   4,   5,   6,   7,   8,   9,  10))

By using lists it is possible to create **arrays of complex types** as well:

In [6]:
x_1d_complex = np.array(
    [("One", 1), ("Two", 2), ("Three", 3)],
    dtype=[("Literal Form", "U10"), ("Numeric Value", "i1")])

print(x_1d_complex)

[('One', 1) ('Two', 2) ('Three', 3)]


In [7]:
print(x_1d_complex["Literal Form"])

['One' 'Two' 'Three']


In [8]:
print(x_1d_complex["Numeric Value"])

[1 2 3]


## 2. Creating via dedicated functions

Numpy allows creation of arrays using dedicated numpy functions.

One of these functions is **arange** which creates a uni-dimensional array with each element having its ordinal value starting from 0.

In [9]:
x_1d_arange = np.arange(10)
print(x_1d_arange)

[0 1 2 3 4 5 6 7 8 9]


Another useful function is **empty** which creates arrays with non-initialized content and various dimensions. 

In [10]:
x_1d_empty = np.empty(10)
print(x_1d_empty)

[6.93205006e-310 6.93205012e-310 6.93205006e-310 6.93205006e-310
 6.93205019e-310 6.93205019e-310 6.93202642e-310 6.93205019e-310
 6.93205099e-310 5.58294180e-322]


In [11]:
x_2d_empty = np.empty((10,10))
print(x_2d_empty)

[[6.93202376e-310 5.43472210e-323 0.00000000e+000 3.95252517e-323
  6.93192365e-310 0.00000000e+000 6.93202628e-310 0.00000000e+000
  6.93202628e-310 0.00000000e+000]
 [6.93202628e-310 0.00000000e+000 2.49009086e-321 9.88131292e-324
  6.93202628e-310 4.66480655e-310 0.00000000e+000 7.90505033e-323
              nan 0.00000000e+000]
 [6.93192365e-310 3.55727265e-321 4.44659081e-323 6.93202628e-310
  4.66480655e-310 4.94065646e-324 7.90505033e-323             nan
  6.93202631e-310 6.93192365e-310]
 [6.93202628e-310 0.00000000e+000 0.00000000e+000 0.00000000e+000
  0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
  0.00000000e+000 0.00000000e+000]
 [6.93202628e-310 0.00000000e+000 6.93202628e-310 0.00000000e+000
  6.93202628e-310 0.00000000e+000 6.93202628e-310 6.93202628e-310
  3.91299992e-321 9.88131292e-324]
 [6.93202628e-310 4.66480655e-310 0.00000000e+000 7.90505033e-323
  0.00000000e+000 0.00000000e+000 0.00000000e+000 9.88131292e-324
  6.93202628e-310 4.66478335e-310

The **zeros** function can be used to create arrays initialized with 0 values: 

In [12]:
x_1d_zeros = np.zeros(10)
print(x_1d_zeros)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [13]:
x_2d_zeros = np.zeros((10,10))
print(x_2d_zeros)

[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]


The **ones** function behaves like **zeros** functions but initializes the array with 1 values. 

The **full** function creates arrays initialized with a specific value and various dimensions:

In [14]:
x_1d_full = np.full(10, -1)
print(x_1d_full)

[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]


In [15]:
x_2d_full = np.full((10,10), -1)
print(x_2d_full)

[[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]]


An array can be also created with random-initialized content by using **random** function from **numpy.random** package:

In [16]:
x_1d_random = np.random.random(10)
print(x_1d_random)

[0.5173243  0.50980414 0.94398728 0.48420488 0.78291118 0.10357251
 0.34889221 0.76528306 0.50048503 0.55553144]


In [17]:
x_2d_random = np.random.random((10, 10))
print(x_2d_random)

[[0.87148379 0.79940125 0.11614865 0.99513655 0.87100105 0.7142052
  0.92614765 0.14347957 0.51730657 0.28283052]
 [0.3954685  0.97098503 0.04777946 0.49987938 0.5587678  0.90215673
  0.27493523 0.05797381 0.879331   0.4323666 ]
 [0.23138751 0.43883196 0.97050638 0.50155029 0.35834456 0.31837864
  0.00927068 0.90245258 0.7238452  0.46806829]
 [0.82035135 0.60118935 0.25417719 0.30314345 0.5155956  0.71969464
  0.23176469 0.16539049 0.66904358 0.32911093]
 [0.51496694 0.22832637 0.2192125  0.86356019 0.28444968 0.24605818
  0.26306795 0.32825842 0.68277818 0.69825317]
 [0.48247559 0.271428   0.93429849 0.45862553 0.97655133 0.09122789
  0.19333361 0.28736221 0.70051214 0.10260642]
 [0.49219914 0.41933213 0.53787834 0.14057413 0.19215617 0.0037713
  0.80598402 0.83529914 0.38918266 0.31968485]
 [0.66131673 0.65166385 0.37121191 0.57883176 0.85629854 0.79295485
  0.7549     0.67507067 0.63450021 0.67020349]
 [0.68491502 0.69216207 0.10074644 0.13063579 0.54283063 0.94173883
  0.85775559 0

## 3. Creating via data loading

Numpy allows loading text-based data (in principle CSV data) via the **loadtxt** function. This function allows specification of the parameters for loading operation such as: rows to be skipped, the data delimiter or a complex numpy type to be associated with the data. 

In [18]:
# import packages for remote data load
import requests
import io

# read data remotely
data_url = "https://raw.githubusercontent.com/INTERTECHNICA-BUSINESS-SOLUTIONS-SRL/Applying-Python-in-Machine-Learning/master/notebooks/numpy_002_creating_arrays_happines_rank_2020.csv"
response = requests.get(data_url)

# load the string data into a record array
loaded_data = np.loadtxt(
    io.StringIO(response.text), 
    skiprows = 1, 
    delimiter = ",", 
    dtype = {"names" : ("Country", "Rank", "Score", "Population"),
            "formats": ("U20", "int8", "float16", "float32")}
)

In [19]:
print(loaded_data.shape)

(152,)


In [20]:
print(loaded_data[0])

('Finland', 1, 7.77, 5540.72)


In [21]:
print(loaded_data["Country"])

['Finland' 'Denmark' 'Norway' 'Iceland' 'Netherlands' 'Switzerland'
 'Sweden' 'New Zealand' 'Canada' 'Austria' 'Australia' 'Costa Rica'
 'Israel' 'Luxembourg' 'United Kingdom' 'Ireland' 'Germany' 'Belgium'
 'United States' 'Czech Republic' 'United Arab Emirates' 'Malta' 'Mexico'
 'France' 'Taiwan' 'Chile' 'Guatemala' 'Saudi Arabia' 'Qatar' 'Spain'
 'Panama' 'Brazil' 'Uruguay' 'Singapore' 'El Salvador' 'Italy' 'Bahrain'
 'Slovakia' 'Trinidad and Tobago' 'Poland' 'Uzbekistan' 'Lithuania'
 'Colombia' 'Slovenia' 'Nicaragua' 'Argentina' 'Romania' 'Ecuador'
 'Kuwait' 'Thailand' 'Latvia' 'South Korea' 'Estonia' 'Jamaica'
 'Mauritius' 'Japan' 'Honduras' 'Kazakhstan' 'Bolivia' 'Hungary'
 'Paraguay' 'Cyprus' 'Peru' 'Portugal' 'Pakistan' 'Russia' 'Philippines'
 'Serbia' 'Moldova' 'Libya' 'Montenegro' 'Tajikistan' 'Croatia'
 'Hong Kong' 'Dominican Republic' 'Bosnia and Herzegovi' 'Turkey'
 'Malaysia' 'Belarus' 'Greece' 'Mongolia' 'Nigeria' 'Kyrgyzstan'
 'Turkmenistan' 'Algeria' 'Morocco' 'Azerbaij

In [22]:
print(loaded_data[-1]["Country"])

South Sudan
