In [1]:
import numpy as np # load numpy module

## Lists

 - A list is a collection of homogeneous/heterogenous elements (int, float, string, etc.)

##### Time complexity: 
 - Read (with index) / append / add: O(1) <br>
 - Insert / delete: O(n) - expensive as the items have to be rearranged after insertion or deletion
 - Value lookup: O(n)

In [2]:
list_ = ['hello', 'how', 'are', 'you', 1, 10, 'how', 'well', 'are'] # created a simple list
list_

['hello', 'how', 'are', 'you', 1, 10, 'how', 'well', 'are']

In [3]:
n = len(list_)  # length of the list_
n

9

Indices are numbered from <b> 0 to n-1 </b> or alternatively from <b> -n to -1 </b>

In [4]:
print(list_[0]) # first element
print(list_[8]) # last element

print(list_[-9]) # first element
print(list_[-1]) # last element

hello
are
hello
are


#### Slicing
 - To get a specific set of elements from the list

In [5]:
list_[:3]  # left-closed type data retrieval

['hello', 'how', 'are']

Only elements with indices 0-2 are retrived and the element with index 3 is skipped

In [6]:
list_[2:5]  # 2-4 are retrived

['are', 'you', 1]

In [7]:
list_[2:]  # 2-8 elements are retrieved. Leaving right-end empty implies last element in the list

['are', 'you', 1, 10, 'how', 'well', 'are']

In [8]:
list_[:4] # 0-3 elements. Leaving left end  empty implies beginning of the list

['hello', 'how', 'are', 'you']

In [9]:
list_[-1:]  # -1 implies last element of the list

['are']

In [10]:
list_[:-1] # only the last element is skipped

['hello', 'how', 'are', 'you', 1, 10, 'how', 'well']

In [11]:
list_[:-2] # last and last-before elements are skipped (-1 and -2 respectively)

['hello', 'how', 'are', 'you', 1, 10, 'how']

In [12]:
list_[::-1]  # to reverse the order

['are', 'well', 'how', 10, 1, 'you', 'are', 'how', 'hello']

#### Delete

In [13]:
list_

['hello', 'how', 'are', 'you', 1, 10, 'how', 'well', 'are']

In [14]:
del list_[4]

list_ # the list_[4] was deleted and replaced by the next element in the list

['hello', 'how', 'are', 'you', 10, 'how', 'well', 'are']

#### Replace

In [15]:
list_[4] = 1232

list_ # list_[4] is replaced by 1232 and the elements with index >= 4 remains the same

['hello', 'how', 'are', 'you', 1232, 'how', 'well', 'are']

#### Insert

In [16]:
list_.insert(4, 111)

list_ # 111 is inserting into list_[4] and the elements with index >= 4 are moved to the right by one

['hello', 'how', 'are', 'you', 111, 1232, 'how', 'well', 'are']

Please note that these simple tricks will be very helpful during projects

## Arrays

A list is a collection of homogeneous items that are contiguously arranged.
 - Mutable but static (the size cannot be changed or in other words new elements cannot be added to it)

##### Time complexity:
 - Same as that of lists - O(1) with index and O(n) for lookup

In [17]:
arrays_ = np.array(list_)
arrays_

array(['hello', 'how', 'are', 'you', '111', '1232', 'how', 'well', 'are'],
      dtype='<U5')

Please notice that the numbers in the list_ are converted to strings when we converted the list_ to an array. An array should consist of elements of the same datatype. As converting a string/char to a number is impossible, the numbers are rather converted to strings.

In [18]:
arrays_[1] = 122  # arrays_[1] element will be replaced with "122"
arrays_

array(['hello', '122', 'are', 'you', '111', '1232', 'how', 'well', 'are'],
      dtype='<U5')

Insert or delete operation cannot be performed on an array as the size cannot be altered. When an array is created a fixed memory location is allocated which cannot be extended. Inserting a value at <i>i</i>th index will overwrite the existing value with the new one and the other elements will remain unchanged.

## Strings
 - Immutable, homogeneous, and dynamic
 
##### Time complexity:
 - Same as that of lists and arrays - O(1) with index and O(n) for lookup

##### Reading/slicing:
 - Same as in lists. Just consider each character in the string as an element of a list.

In [19]:
string_ = "Hello, I am Jack and I am a data scientist"

string_[1]  # 'e' from Hello

'e'

In [20]:
string_[:11]  # slicing

'Hello, I am'

In [21]:
string_[1] = 'r' # immutable

TypeError: 'str' object does not support item assignment

#### Combining strings:

In [22]:
# You can add more words or lines by adding 2 strings arithmatically.
string_new = "I am 10 years experienced"

string_ + ". " + string_new  # very simple but painful to add ". " between every string that you want to add

'Hello, I am Jack and I am a data scientist. I am 10 years experienced'

In [23]:
". ".join([string_, string_new])  # This solves the above problem in joining multiple strings

'Hello, I am Jack and I am a data scientist. I am 10 years experienced'

#### Find
- Returns the index of the leftmost word/character searched

In [24]:
string_.find("data")  # returns the index of the beginning of the word

28

In [25]:
string_[28:32]

'data'

In [26]:
string_.find("Amazon")  # Returns -1 if search word is not in the string

-1

#### rfind:
- Returns the index of the rightmost word/character searched

In [27]:
print(string_.find("am"))
print(string_.rfind("am"))  # returns the index of the first character of the rightmost word

9
23


#### Count
- Counts the number of occurances of a word/character

In [28]:
string_.count('am')

2

In [29]:
string_.count('a')

7

#### Split
- Splits a string, at a specified character, into a list of elements

In [30]:
string_list = string_.split(" ")
string_list

['Hello,', 'I', 'am', 'Jack', 'and', 'I', 'am', 'a', 'data', 'scientist']

#### Startswith
- Checks if the string starts with the search word/character

In [31]:
string_.startswith('Hello')

True

In [32]:
string_.startswith('I') # I is not the beginning of the string

False

In [33]:
# If you like to check for a word in the string. Do the following

string_list  # list of words from string_

[word_.startswith('am') for word_ in string_list]

[False, False, True, False, False, False, True, False, False, False]

So there are two 'am's in the string

In [34]:
sum([word_.startswith('am') for word_ in string_list]) # to directly get the count

2

#### Endswith
- Similar to 'startswith' method but here to check at the end of a string

In [35]:
string_.endswith('scientist') # note that it could be a partial word

True

In [36]:
string_.endswith('tist') # note that it could be a partial word

True

In [37]:
string_.endswith('data')

False

#### Replace

 - Although a specific character element cannot be replaced, word as a whole can be replaced with another word.

In [38]:
string_.replace("Jack", "John")  # replaces Jack with John

'Hello, I am John and I am a data scientist'

## Tuples

 - Immutable and static
 - Tuples are used when we want to <u>restrict the user from changing values</u> in the list

#### Time complexity:
 - Same as that of lists and arrays - O(1) with index and O(n) for lookup

In [39]:
tuples_ = tuple(list_)
tuples_

('hello', 'how', 'are', 'you', 111, 1232, 'how', 'well', 'are')

Note that the elements are enclosed in parentheses. 

In [40]:
# Another method for creating tuples
tuples_ = 'hello', 'how', 'are', 1, 1232, 'well'
tuples_

('hello', 'how', 'are', 1, 1232, 'well')

## Sets

A set is an unordered collection with <u> no duplicate elements</u>. Set is another data type that we often use in data wrangling. It is used to eliminate duplicate values from a list or an array.
 - Immutable and dynamic

In [41]:
set_ = set(list_)
set_

{111, 1232, 'are', 'hello', 'how', 'well', 'you'}

In [42]:
# set_[2] = 'lel' will not work as the values cannot be replaced
set_.add('lel')  # values can be added - dynamic
set_

{111, 1232, 'are', 'hello', 'how', 'lel', 'well', 'you'}

It returns only the unique values from the list_ in no specific order.

## Dictionaries
A dictionary consists of key-value pairs in no specific order. The key-value pairs are stored in an associative array. Each key maps to a value or a list or another dictionary. 

The <b> keys are usually hash codes </b> that are generated using a hash function. The idea is to have a unique hash code for every value to be stored. Hence the read operation is much faster than any other data types - O(1) (even for search operations - the best case). Please note that here we <u> sacrifice space complexity, by storing keys in the memory, to achieve better time complexity</u>.

#### Time complexity:
 - Most dict operations O(1) (even for lookup operation)

In [43]:
# Lets store age of people as dictionary
dict_ = {"Julie": 32, "Rahul": 23, "Jasmine": 12, "Jack": 15, "Jennifer": 18}
dict_

{'Julie': 32, 'Rahul': 23, 'Jasmine': 12, 'Jack': 15, 'Jennifer': 18}

In [44]:
# To retrieve values
dict_['Jasmine']

12

In [45]:
# To get list of keys
dict_.keys()

dict_keys(['Julie', 'Rahul', 'Jasmine', 'Jack', 'Jennifer'])

In [46]:
# To get list of values
dict_.values()

dict_values([32, 23, 12, 15, 18])

#### Other methods to create a dictionary

In [47]:
# method 2
dict([('Julie', 32), ('Rahul', 23), ('Jasmine', 12)])

{'Julie': 32, 'Rahul': 23, 'Jasmine': 12}

In [48]:
# method 3
dict(Julie=32, Rahul=23, Jasmine=12)

{'Julie': 32, 'Rahul': 23, 'Jasmine': 12}

In [49]:
# method 4 - list comprehension technique
{x: 2*x for x in range(5)}

{0: 0, 1: 2, 2: 4, 3: 6, 4: 8}

#### To print a dictionary

In [50]:
for k, v in dict_.items():
    print(k, v)

Julie 32
Rahul 23
Jasmine 12
Jack 15
Jennifer 18


#### To reverse the key-value pairs

In [51]:
dict_reversed = {v:k for k, v in dict_.items()}  # Note the curly braces
dict_reversed

{32: 'Julie', 23: 'Rahul', 12: 'Jasmine', 15: 'Jack', 18: 'Jennifer'}

------