# Working with data structures

## Junior mentor:

Y. Fabian Bautista


## Objectives of the tutorial

1. To create and manipulate data sets: Lists, arrays, dictionaries
2. To use Pandas package to manipulate data sets
3. To plot data structures using matplotlib
3. To use conditional statements
4. To fit data, and computing goodness of fit

### Loading packages

In [2]:
import pandas as pf # For data structure manipulations
import matplotlib.pyplot as plt # For ploting

# A review of list 

Recall from Tutorial 1 that List are ordered sequences. It is a mutable object, and can be composed by elements of different <code > type </code>

In [3]:
list1 = [1,3,"dark matter",(3,4)]

we can get the <code> type() </code> of the list as follows 

In [5]:
type(list1)

list

Indexing of list is simple

In [9]:
list1[2]

'dark matter'

We can also acces to subcomponent of a list

In [13]:
list1[3][0]

3

lists can also be sliced 

In [27]:
list1[2:4]

['dark matter', (3, 4)]

we can also ask for the <code> len() </code> of the list

In [6]:
len(list1)

4

As mentioned, list are mutable objects, which means that if one copy the list, and change the copy, the original list will also change

In [18]:
list2 = ['a','b','c']
list2

['a', 'b', 'c']

In [19]:
list3 = list2
list3

['a', 'b', 'c']

for instance if we change one element of <code > list3 </code>, <code > list2 </code> also changes the same element

In [20]:
list3[1]=1
print(list3)
print(list2)


['a', 1, 'c']
['a', 1, 'c']


To avoid that, we can <code> clone </code> the list as follows 

In [21]:
list4 = [1,2,3,4]
list5 = list4[:]

In [24]:
list5[1] = 'a'
print(list5)
print(list4)

[1, 'a', 3, 4]
[1, 2, 3, 4]


We can concatenate lists as follows

In [25]:
list1 + list2

[1, 3, 'dark matter', (3, 4), 'a', 1, 'c']

# Arrays

A numpy array is similar to a list. It's usually fixed in size and each element is of the same type. We can cast a list to a numpy array by first importing numpy: 

In [28]:
import numpy as np # For array and mathematical manipulations

## 1D Arrays

 We  cast a list as follows:


In [29]:
# Create a numpy array

a = np.array([0, 1, 2, 3, 4])
a

array([0, 1, 2, 3, 4])

Each element is of the same type, in this case integers: 


 As with lists, we can access each element via a square bracket:


In [35]:
# Print each element

print("a[0]:", a[0])
print("a[1]:", a[1])
print("a[2]:", a[2])
print("a[3]:", a[3])
print("a[4]:", a[4])

a[0]: 0
a[1]: 1
a[2]: 2
a[3]: 3
a[4]: 4


If we check the type of the array we get <b>numpy.ndarray</b>:


In [37]:
# Check the type of the array

type(a)

numpy.ndarray

As numpy arrays contain data of the same type, we can use the attribute "dtype" to obtain the Data-type of the array’s elements. In this case a 64-bit integer: 

In [38]:
# Check the type of the values stored in numpy array

a.dtype

dtype('int64')

We can create a numpy array with real numbers:


In [39]:
# Create a numpy array

b = np.array([3.1, 11.02, 6.2, 213.2, 5.2])

In [40]:
# Check the type of array

type(b)

numpy.ndarray

In [41]:
# Check the value type

b.dtype

dtype('float64')

### Slicing

Like lists, we can slice the numpy array, and we can select the elements from 1 to 3 and assign it to a new numpy array <code>d</code> as follows:

In [45]:
# Create numpy array

c = np.array([20, 1, 2, 3, 4])

# Slicing the numpy array

d = c[1:4]
d

array([1, 2, 3])

### Assign Value with List


Similarly, we can use a list to select a specific index.
The list ' select ' contains several values:

In [48]:
# Create the index list

select = [0, 2, 3]

We can use the list as an argument in the brackets. The output is the elements corresponding to the particular index:

In [49]:
# Use List to select elements

d = c[select]
d

array([20,  2,  3])

### Other Attributes

Let's review some basic array attributes using the array <code>a</code>:

In [51]:
# Create a numpy array

e = np.array([0, 1, 2, 3, 4])
e

array([0, 1, 2, 3, 4])

The attribute <code>size</code> is the number of elements in the array:

In [52]:
# Get the size of numpy array

e.size

5

The next two attributes will make more sense when we get to higher dimensions but let's review them. The attribute <code>ndim</code> represents the number of array dimensions or the rank of the array, in this case, one:

In [53]:
# Get the number of dimensions of numpy array

e.ndim

1

The attribute <code>shape</code> is a tuple of integers indicating the size of the array in each dimension:

In [54]:
# Get the shape/size of numpy array

a.shape

(5,)

### Mean of a 1D numpy array

In [56]:
# Get the mean of numpy array

mean = e.mean()
mean

2.0

### Get the standard deviation of 1D numpy array


In [58]:
# Get the standard deviation of numpy array

standard_deviation = e.std()
standard_deviation

1.4142135623730951

### Maximal and minimal values in a 1D array

In [59]:
# Get the biggest value in the numpy array

max_e = e.max()
max_e

4

In [60]:
# Get the smallest value in the numpy array

min_e = e.min()
min_e

0