# Getting started

#### 1. Almost all data* can be represented as a array of numbers
For example :
       -  An image can be represented with a 3dimensional array (RGB) representing the intensities.
       -  An mp3 can be represented as an array of intensities wrt time
       -  An text document can be represented as binary data
    Basically all data can be represented as an array of numbers.

Okay, so we have a large array of numbers, now while working with this large array of numbers, how to efficiently store and use them.
           

#### 2. Existing solutions 
In order to work with a large array of numbers, in python we already have the <code>list</code> right.

In [42]:
million = list(range(0,10**6))
print(million[-1]) 

999999


So the list is a quite easy and fairly simple data strucutre to use. So thats all we need for working with a image or text or audio data right ...? waitup!

In [43]:
import sys
print("Item Count -", len(million),"\n","Ocuupied Size - ",sys.getsizeof(million)," bytes")# thats almost 9 times the size of the data in millionare

Item Count - 1000000 
 Ocuupied Size -  9000112  bytes


Notice that the  soze of holding a basic array of million integers is 9million

Type |  Storage size |  Value range 
------|--------|-------:
 signed char   |  1 byte       |  -128 to 127 
  int          |   2 or 4 bytes|-32,768 to 32,767 or -2,147,483,648 to 2,147,483,647   
  unsigned int | 2 or 4 bytes  | 0 to 65,535 or 0 to 4,294,967,295  

From above table we can see that roughly by storing unsinged int (4bytes) we should be having a maximum of 4*1million bytes of storage used. but this is holding way more than that.

#### 3. Lets look at whats happening here
Since python is dynamically typed, that means that a varible can hold any type of data thoughout its lifetime in a program. unlike statically type data where if a variable is initialized as a int, it should be a int throughout its lifetime.
<code>
    python -- x = 10; x = {};
    c++   --- int a = 10; a =/= {};</code>
    
Lets  look at a python variable and its structure for understanding more about how python does this dynamic typing
https://github.com/certik/python-2.7/blob/c360290c3c9e55fbd79d6ceacdfc7cd4f393c1eb/Doc/c-api/structures.rst



Hence million[0] - 0, is more than just 0 that is stored in memory.
We need a more efficient mechanisim for the storing of this data with the easy of use of lists.

In [28]:
import array
million_array = array.array('i', million)
print("Item Count -", len(million_array),"\n","Ocuupied Size - ",sys.getsizeof(million_array)," bytes")# thats almost 9 times the size of the data in millionare

Item Count - 1000000 
 Ocuupied Size -  4000064  bytes


This is way better than storing it as a list, This is a good solution, but making multidimensional arrays and slicing and dicing would be hard. Theres an alternative, well proven solution <code> numpy </code> 

In [2]:
import numpy as np

million_np_array = np.array(million, dtype="int32")
print("Item Count -", len(million_np_array),"\n","Ocuupied Size - ",sys.getsizeof(million_np_array)," bytes")# thats almost 9 times the size of the data in millionare

NameError: name 'million' is not defined

#### 4. Introduction to numpy and its modules

In [3]:
import numpy as np

# Creating arrays from existing lists.
items = list(range(10))
np.array(items)


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
# Unlike the python lists, the numpy array contains elements of the same type.
# If types do not match, numpy will try to upcast. like from int to float or so.
print("Just declaring the array.         -->", np.array([1, 2, 3, 2.7, 69.98]))
print("Declaring the array with int type -->", np.array([1, 2, 3, 2.7, 69.98], dtype='int'))
print("Declaring the array with float type -->", np.array([1, 2, 3, 2.7, 69.98], dtype='float32'))


Just declaring the array.         --> [ 1.    2.    3.    2.7  69.98]
Declaring the array with int type --> [ 1  2  3  2 69]
Declaring the array with float type --> [ 1.    2.    3.    2.7  69.98]


In [25]:
# Functions to help get started with numpy

# To Create a array of 0's or 1's 
zero_array = np.zeros(5)
print("Zero array : ",zero_array ) 
print("Data type by default : ", zero_array.dtype)
# by default data type is set to float64 for a 64 bit machine and 32 for a 32bitmachine

Zero array :  [0. 0. 0. 0. 0.]
Data type by default :  float64
