# Data Structures

For Python problems, we suggest you brush up on the following things:

    data structures (lists, tuples, dictionaries)
    flow control (if-else, for, while loops)
    the "itertools" library
    the "collections" library
    working with files.

Please note that, for the Python problem, you'll be expected to use only "vanilla" Python - native Python data structures, and only modules from the standard library. You won't be able to use Numpy, pandas, and the like.

For the SQL problem sets, make sure you're comfortable with aggregation functions (COUNT(), SUM() and the like), CASE WHEN statements, the various types of joins, filtering (WHERE and HAVING), and subqueries. For bonus points, check out the Postgres documentation to see how windowing functions work.

## NOTES


### Primitive Data Structures

These are the most primitive or the basic data structures. They are the building blocks for data manipulation and contain pure, simple values of a data. Python has four primitive variable types:

Integers

You can use an integer represent numeric data, and more specifically, whole numbers from negative infinity to infinity, like 4, 5, or -1.

Float

"Float" stands for 'floating point number'. You can use it for rational numbers, usually ending with a decimal figure, such as 1.11 or 3.14.

String

Strings are collections of alphabets, words or other characters. In Python, you can create strings by enclosing a sequence of characters within a pair of single or double quotes. For example: 'cake', "cookie", etc.

Boolean

This built-in data type that can take up the values: True and False, which often makes them interchangeable with the integers 1 and 0. Booleans are useful in conditional and comparison expressions

In [1]:
x = "cake"
y = 'cookie'
x + " & " + y

'cake & cookie'

In [2]:
x*2

'cakecake'

In [4]:
#string slicing
z1 = x[2:]
print(z1)

ke


In [6]:
z2 = y[0] + y[1]
z2

'co'

In [14]:
y = 'cookie'
print(str.capitalize(y))

Cookie


In [13]:
print(len(y))

6


In [15]:
y.isdigit()

False

In [16]:
y = 'cookie'
y.replace('co','Co')

'Cookie'

In [17]:
str1 = 'cookie'
str2 = 'cook'
str1.find(str2)

0

In [18]:
x = 2
y = "The Godfather: Part "
fav_movie = (y) + str(x)
print(fav_movie)

The Godfather: Part 2


## Non-Primitive Data Structures

Non-primitive types are the sophisticated members of the data structure family. They don't just store a value, but rather a collection of values in various formats.

In the traditional computer science world, the non-primitive data structures are divided into:

    Arrays
    Lists
    Files

### Array

First off, arrays in Python are a compact way of collecting basic data types, all the entries in an array must be of the same data type. However, arrays are not all that popular in Python, unlike the other programming languages such as C++ or Java.

In general, when people talk of arrays in Python, they are actually referring to lists. However, there is a fundamental difference between them and you will see this in a bit. For Python, arrays can be seen as a more efficient way of storing a certain kind of list. This type of list has elements of the same data type, though.

In Python, arrays are supported by the array module and need to be imported before you start inititalizing and using them. The elements stored in an array are constrained in their data type. The data type is specififed during the array creation and specified using a type code, which is a single character like the I you see in the example below:

In [19]:
import array as arr
a = arr.array("I",[3,6,9])
type(a)

array.array

### List

Lists in Python are used to store collection of heterogeneous items. These are mutable, which means that you can change their content without changing their identity. You can recognize lists by their square brackets [ and ] that hold elements, separated by a comma ,. Lists are built into Python: you do not need to invoke them separately. 

In [20]:
x = [] # Empty list
type(x)

list

In [21]:
x1 = [1,2,3]
type(x1)

list

In [22]:
x2 = list([1,'apple',3])
type(x2)

list

In [23]:
print(x2[1])

apple


In [24]:
x2[1] = 'orange'
print(x2)

[1, 'orange', 3]


In [25]:
list_num = [1,2,45,6,7,2,90,23,435]
list_char = ['c','o','o','k','i','e']

list_num.append(11) # Add 11 to the list, by default adds to the last position
print(list_num)

[1, 2, 45, 6, 7, 2, 90, 23, 435, 11]


In [26]:
list_num.insert(0, 11)
print(list_num)

[11, 1, 2, 45, 6, 7, 2, 90, 23, 435, 11]


In [27]:
list_char.remove('o') 
print(list_char)

['c', 'o', 'k', 'i', 'e']


In [28]:
list_char.pop(-2) # Removes the item at the specified position
print(list_char)

['c', 'o', 'k', 'e']


In [29]:
list_num.sort() # In-place sorting
print(list_num)

[1, 2, 2, 6, 7, 11, 11, 23, 45, 90, 435]


In [30]:
list.reverse(list_num)
print(list_num)

[435, 90, 45, 23, 11, 11, 7, 6, 2, 2, 1]


https://www.datacamp.com/community/tutorials/18-most-common-python-list-questions-learn-python

Arrays versus Lists

Now that you have seen lists in Python, you maybe wondering why you need arrays at all. The reason is that they are fundamentally different in terms of the operations one can perform on them. With arrays, you can perform an operations on all its item individually easily, which may not be the case with lists.

### Tuples

Tuples are another standard sequence data type. The difference between tuples and list is that tuples are immutable, which means once defined you cannot delete, add or edit any values inside it. This might be useful in situations where you might to pass the control to someone else but you do not want them to manipulate data in your collection, but rather maybe just see them or perform operations separately in a copy of the data. 

In [32]:
x_tuple = 1,2,3,4,5
y_tuple = ('c','a','k','e')
x_tuple[0]

1

In [33]:
y_tuple[3]
x_tuple[0] = 0

TypeError: 'tuple' object does not support item assignment

## Dictionary

Dictionaries are exactly what you need if you want to implement something similar to a telephone book. None of the data structures that you have seen before are suitable for a telephone book.

This is when a dictionary can come in handy. Dictionaries are made up of key-value pairs. key is used to identify the item and the value holds as the name suggests, the value of the item.

In [34]:
x_dict = {'Edward':1, 'Jorge':2, 'Prem':3, 'Joe':4}
del x_dict['Joe']
x_dict

{'Edward': 1, 'Jorge': 2, 'Prem': 3}

In [35]:
x_dict['Edward'] # Prints the value stored with the key 'Edward'.

1

In [36]:
len(x_dict)

3

In [37]:
x_dict.keys()

dict_keys(['Jorge', 'Prem', 'Edward'])

In [38]:
x_dict.values()

dict_values([2, 3, 1])

### Sets

Sets are a collection of distinct (unique) objects. These are useful to create lists that only hold unique values in the dataset. It is an unordered collection but a mutable one, this is very helpful when going through a huge dataset.

In [39]:
x_set = set('CAKE&COKE')
y_set = set('COOKIE')

print(x_set)

{'K', 'A', '&', 'O', 'C', 'E'}


### Files

Files are traditionally a part of data structures. And although big data is commonplace in the data science industry, a programming language without the capability to store and retrieve previously stored information would hardly be useful. You still have to make use of the all the data sitting in files across databases and you will learn how to do this.

The syntax to read and write files in Python is similar to other programming languages but a lot easier to handle. Here are some of the basic functions that will help you to work with files using Python:


    open() to open files in your system, the filename is the name of the file to be opened;
    read() to read entire files;
    readline() to read one line at a time;
    write() to write a string to a file, and return the number of characters written; And
    close() to close the file.


In [None]:
# File modes (2nd argument): 'r'(read), 'w'(write), 'a'(appending), 'r+'(both reading and writing)
f = open('file_name', 'w')

# Reads entire file
f.read() 

# Reads one line at a time
f.readline() 

# Writes the string to the file, returning the number of char written
f.write('Add this line.') 

f.close()

The second argument in the open() function is the file mode. It allows you to specify whether you want to read (r), write (w), append (a) or both read and write (r+).

To learn more about file handling in Python, be sure to check out this page.