### Introduction
Python is a a dynamically typed, interpreted language,
where values are stored not in dense buffers but in scattered objects.  
The dynamic typing means that there are a lot more steps involved with any operation. 
This is a primary reason that Python is slow compared to C for operations on numerical data.



Libraries like NumPy, SciPy provide an alternative. 


### DATA TYPES

Python Supports following data Types

- Boolean
- Set 
- Dictionary
- List 
- Integer, String, Float 
- Object
- Complex None 

----------------------------------------------------------------------------

References: 
 - http://www.datasciencecourse.org/ 
 - https://www.coursera.org/specializations/statistics-with-python 



In [None]:
import math

### Numerical or QUantative Types

For discrete we use Integer and Store it exactly as is 

For continuous we use Float which allows for decimal places but loses precision

### Integers

The standard Python implementation is written in C. This means that every Python object is simply a cleverly-disguised C structure, 
which contains not only its value, but other information as well. 
For example, when we define an integer in Python, such as x = 10000, x is not just a "raw" integer. It's actually a pointer to a compound C structure, 
which contains several values. Looking through the Python 3.4 source code, we find that the integer (long) type definition effectively looks like this (once the C macros are expanded):

```
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};

```
A single integer in Python 3.4 actually contains four pieces:

- ob_refcnt, a reference count that helps Python silently handle memory allocation and deallocation
- ob_type, which encodes the type of the variable
- ob_size, which specifies the size of the following data members
- ob_digit, which contains the actual integer value that we expect the Python variable to represent.


In [None]:
type (4)

int

In [None]:
type (0)

int

In [None]:
type (-3)

int

In [None]:
numbers = [2,3,4,5]

print (sum (numbers)/len (numbers))

type (sum (numbers)/len(numbers)) 

3.5


float

### Float

In [None]:
3/5


0.6

In [None]:
type (3/5)

float

In [None]:
type (math.pi)

float

In [None]:
numbers = [math.pi, 3/5, 4.1]
type (sum (numbers)/len (numbers))

float

### Categorial and Quantitive 

- Nominal 
 -  Boolean
 - String 
 - None 

- Ordinal 
 - Useful for creating visuals 



In [None]:
### Booleans 

type (bool ('yes'))


bool

In [None]:
if 6>5:
    print ("yes")

yes


In [None]:
myList = [True, 6<5, 1==3, None is None ]

In [None]:
for element in myList:
    print (type(element))

<class 'bool'>
<class 'bool'>
<class 'bool'>
<class 'bool'>


### String

In [None]:
type ("The quick brown fox jumped over the lazy dog")

str

In [None]:
Mylist = ['brown','green','red']

print (Mylist)


['brown', 'green', 'red']


### Nonetype

In [None]:
x = None 
type (x)

NoneType

### List 

In [None]:
MyList = [1, 1.1, "myString", None] #Lists are denoted by items within square brackets.


<class 'int'>
<class 'float'>
<class 'str'>
<class 'NoneType'>


Items within a list can be accessed via several types of indexing: indexing to a given element, negative indexing (backward from the end of a list), and slice-based indexing that returns subsets of the list.

In [None]:
print(MyList[0])
print(MyList[-1])
print(MyList[1:2])
print(MyList[0:4:2])
for a in MyList:
    print(a)

1
None
[1.1]
[1, 'myString']
1
1.1
myString
None


In [None]:
for element in MyList:
    print (type(element))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'NoneType'>


a method for constructing a list by iterating over another list. Let's suppose we wanted to create a list3 object that included every element from list1, but with an underscore after each string. 
We could do it by explicitly constructing the list and adding each element, like so.

In [None]:
list3 = []
for x in list1:
    list3.append(x + "_")
list3

However, this quickly gets verbose, if we want to create several new lists this way.  We could get this same result through a list comprehension, which has the syntax `[some_expression(item) for item in list]`, and returns a new list by applying `some_expression` (not necessarily an actual function, just some expression that involves `item`) to each element of the list.

In [None]:
list1 = ["a", "b", "c", "d"]

list3 = [x + "_" for x in list1]
list3

['a_', 'b_', 'c_', 'd_']

As a slightly more complex example, let's use list comprehensions to count the number of words in a file.  The "shakespeare.txt" file included with this notebook contains test for all the collected works of Shakespeare.  Let's first read all the lines of Shakespeare into a file.

In [None]:
sum([len(line.split(" ")) for line in lines])

### List implmentation internals

As a final note about lists, it is important for creating efficient Python code to understand a little bit about the nature of how lists 
are implemented internally.  Python lists are not really "lists" in the typical CS sense of the word. A Python list, on the other hand, has a pointer to a contiguous buffer of pointers, each of which points to a Python object which in turn has references to its data (in this case, integers). 
 
 
![alt text](list_internal.png "Logo Title Text 1")
 
 
 
 
 Instead, a Python list is really a _dynamically sized array_.  That is, a list is an array of fixed-size elements (more precisely, it is an array of abstract "Python Object" elements, which in turn point to numbers, strings, other arrays, etc, while having the actual array elements in the list be fixed size).  The arrays are pre-allocated to have a certain amount of "extra elements" that can be used to add new items to the list; if we want to append a new item to the list, and do not have enough room in the underlying array, it is resized with some additional buffer (for example, the underlying size could be doubled ... though in reality Python uses a slightly more involved growth algorithm than just doubling each time) to allow for the additional elements.  Because this resizing happens relatively infrequently, it does not cost much from a computational perspective.  Put in big O notation, the following operations are lists are all constant time:

- Append: O(1) (average case)
- Lookup: O(1)
- Delete last element: O(1)

On the other hand, if you attempt to insert a new element somewhere in the middle of the list (shifting down later element), or delete an item in the middle of the list (shifting up later elements), these operations are expensive, because we need to copy every single item in the list after the inserted or deleted item, and move it in memory.  In big O notation, the following operations take linear time:

- Insert element at arbitrary position: O(n)
- Delete element at arbitrary position: O(n)

As an illustration of this, let's consider two versions of code that creates an array, either by appending or inserting (at the beginning of the list) new elements:

### Dictionaries

The next main built in data type you'll use in data science is the dictionary.  Dictionaries are mappings from keys to values, where keys can be any "immutable" Python type (most commonly strings, numbers, booleans, tuples ... but importantly _not_ lists or other dictionaries), and values can be any python type (including lists or dictionaries).

Dictionaries can be created with curly brackets, like the following:

In [None]:
dict_example = {"my_val1":1, "my_val2":2, "my_val3":3}

And then elements are accessed by square brackets.

In [None]:
dict_example["my_val1"] 

1

Unlike lists, dictionaries can't be indexed by slices, or negative indexing, or anything like that.  
The keys in a dictionary are treated as _unordered_, so there is no notion to sequence in the items of a dictionary.  
We can use the same notation to assign new elements to the dictionary.

In [None]:
dict_example["d"] = 4

Note that we can make this call even though `dict["d"]` previously did not contain anything (and if we try to just execute this statement, it will throw an exception).  You can also check for a key belonging to a dictionary with the command:

"some_val" in dict_example

Finally, there is the analogue to list comprehensions: dictionary comprehensions.  These are specified similar to a list comprehension, but are denoted by the syntax `{key(item) : value(item) for item in list}`.

### Dictionary internals

Internally, dictionaries are represented using _hash tables_.  That is, the dictionary elements are all contained in a relatively small array.  When inserting a new element in the dictionary, we compute a _hash_ of the key, and modulo this number by the size of the array.  If that location in the underlying array is empty, we store the key/value pair at that index.  The the location is already with note _different_ element (which has to be possible, since there are more possible keys than the size of the underlying array) we execute what is called a "probing strategy" to find the next free slot.  The details of the hashing function and probing strategy aren't important (some info is on the page linked below), but the key portions are that the process is deterministic, so we can easily both assign elements and look up elements by key (similarly computing the hash and probing until the key at the location matches the key we are looking for ... note that this is why we need to store both the key _and_ value in the array).

A good hash function will tend to generate "random" locations in the array, so there is a high probably we will find an empty slot without too many probing iterations.  In Python, once the array is 2/3 full, we double (or quadruple, for small-sized dictionaries) the size of the underlying array, and rebuild the hash table (which is required, since the location of different key/values pairs now changes for a different size of underlying array).  Just like with a list, although this rebuilding operation is slow, because it happens relatively infrequently, it does not end up costing too much time.  In big O notation, the operations of a dictionary all have (average case) constant time:

- Insert new element: O(1)
- Lookup element: O(1)
- Delete element: O(1)

More information about dictionary internals is here: [Python dictionary implementation](https://www.laurentluce.com/posts/python-dictionary-implementation/)

## Classes

The first note is that all functions within a Python class take an instance of that class as the first argument, usually named self in implementations (but it can be named anything). Class methods and static methods (see the discussion in the list below) behave differently. The second thing of some importance is that unlike other object-oriented languages, Python does not actually have a distinction between "public" and "private" variables. Although here it seems like the "n" variable should be private (inaccessible outside the class), and only accessible via the method get_n(), we can just as easily access it directly.

In [None]:
class MyClass:
    def __init__(self, n):
        self.n = n
    
    def get_n(self):
        return self.n

a = MyClass(1)
a.get_n()