Title: Arrays
Date: 2019-12-06 11:30
Slug: arrays

An array is an object that contains objects of the same type; while lists can hold objects of varying types, arrays cannot [1].

This post will provide some basic information about arrays related to their shape and dimensionality.

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

In [2]:
# Some ways an array can be created are as follows:

# Option 1: Converts a list to an array.

a1 = np.array([1, 2, 3, 4, 5, 6])
print('Option 2:')
print(a1)

# Option 2: Creates an array of zeros.

a2 = np.zeros((6,1))
print('Option 1:')
print(a2)

Option 2:
[1 2 3 4 5 6]
Option 1:
[[0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]]


In [3]:
# After creating an array with Numpy, the shape can be called.

# The shape of a one-dimensional array is the number of objects in it.

print(a1.shape)

# The shape of a two-dimensional array has the form of (n,m), indicating n rows and m columns [2].

print(a2.shape)

(6,)
(6, 1)


Array 'a1' is one-dimensional since there are 5 numbers (i.e. the shape can be described by one number).

Array 'a2' is two-dimesional since there are 6 rows and 1 column.

In [4]:
# Indexing an object in a one-dimensional array will return the number.

print('One-dimensional array:')
print(a1[0])
print(a1[1])
print(a1[2])
print(a1[3])
print(a1[4])
print(a1[5])

# Indexing an object in a two-dimensional array will return a one-dimensional array.

print('Two-dimensional array:')
print(a2[0])
print(a2[1])
print(a2[2])
print(a2[3])
print(a2[4])
print(a2[5])

One-dimensional array:
1
2
3
4
5
6
Two-dimensional array:
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]


Below, array 'b' will be created as a 1D object and then converted to a 2D object.

In [5]:
# Original shape of array 'b'.

b = np.array([1, 2, 3, 4, 5, 6])

# Call the shape.

b.shape

(6,)

In [6]:
# Reshape the object by calling np.reshape.
# Array 'b' can now be described as having 6 rows and 1 column.

b = b.reshape(6,1)
b

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])

In [7]:
# Indexing each row will print a 1D array from each row. 

print(b[0])
print(b[1])
print(b[2])
print(b[3])
print(b[4])
print(b[5])

[1]
[2]
[3]
[4]
[5]
[6]


In [8]:
# Array 'c' below is a 1D array consisting of 6 elements and described with 1 number.

c = np.array([1,2,3,4,5,6])
c.shape

(6,)

In [9]:
# This can be transformed to a 2D array and be expressed with 2 numbers (rows and columns).

c.reshape(6,1)

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])

Alternatively, if the number of desired columns is known, an array can be reshaped with (-1,m).

'-1' indicates that NumPy will discern how to distribute the objects in that row in order to create the m number of columns requested [2].

In [10]:
# Below a reshape for a one column array is requested.
    
c.reshape(-1,1)

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])

In [11]:
# Shape of reshaped array.

c.reshape(-1,1).shape

(6, 1)

The opposite stands true with (n,-1) where n rows can be created, distributing the column elements as necessary [2].

In [12]:
# Below a reshape for a one row array is requested.

c.reshape(1,-1)

array([[1, 2, 3, 4, 5, 6]])

In [13]:
# Shape of reshaped array.

c.reshape(1,-1).shape

(1, 6)

In [14]:
# Arrays can be 2D with a variety of shapes beyond one column.
# Below is a reshape request with 3 rows and 2 columns.

c.reshape(3,2)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [15]:
# Below is a reshape request for 2 rows and 3 columns.
# Even though there are three objects in each row (3 columns), the dimension is still 2.

c.reshape(2,3)

array([[1, 2, 3],
       [4, 5, 6]])

In [16]:
# As noted earlier, array 'c' can be reshaped with -1 with the other row or column numbers known.

c.reshape(-1,3)

array([[1, 2, 3],
       [4, 5, 6]])

In [17]:
# If array 'c' is then changed to a 2D shape and then indexed, the output will be a 1D array.

c = c.reshape(-1,3)

c[0]

array([1, 2, 3])

In [18]:
# Again reshaping with -1 in the column position.

c.reshape(3,-1)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [19]:
# Reshape and index.

c = c.reshape(3,-1)

c[0]

array([1, 2])


When using arrays in scikit-learn with LinearRegressions, the objects called must have the proper shape [3].

In [20]:
# LinearRegression example.

X = np.array([1, 2, 3, 4, 5])
y = [6, 7, 8, 9, 10]

X.shape

(5,)

In [21]:
model = LinearRegression()
model.fit(X,y)

ValueError: Expected 2D array, got 1D array instead:
array=[1 2 3 4 5].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

This error above arises because the function expects a 2D array.

The reshape prompt -- array.reshape(-1,1) -- indicates that the function would like the object to have two dimensions.

Below are a few array reshape options (assuming X is already an array):

Option 1:

    X = X.reshape(-1,1)

Option 2: 

    model.fit(x.reshape(-1,1), y)

In [22]:
# Reshaping using Option 1:

X = np.array([1, 2, 3, 4, 5])
y = [6, 7, 8, 9, 10]

X = X.reshape(-1,1)

model = LinearRegression()
model.fit(X,y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

The array now has the proper shape to run the LinearRegression function.

Sources:
[1] https://docs.python.org/3/library/array.html
[2] https://docs.scipy.org/doc/numpy/user/quickstart.html
[3] https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html 