### Combined usage of list and string data type

In [1]:
s = "This is a string"

# s.split() creates a list of substrings
# that are seperated by white spaces
s.split()

['This', 'is', 'a', 'string']

In [5]:
# To get all characters of a string in a list
l = list(s)
print(l)
print(type(l))

['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g']
<class 'list'>


In [6]:
# Using the join function to join a list to a string
# Join elements of l as given
s_joined = "".join(l)
print(s_joined)
print(type(s_joined))

This is a string
<class 'str'>


### Deeper Knowledge of python 'list'

-> A python list is a very complex data structure that offers high flexibility and lets us perform very complex tasks easily, but at the cost of performance. As a list is a very complex data structure that can contain any data type and is mutable, it has to keep track of and work with a lot of memory addresses and pointers, making it very inefficient as a base for high performance numerical computation.  

-> Furthermore a list can expand and shrink as demanded, hence it is a dynamic data structure that doesn't have a predefined size or shape.

-> We will see in future projects that although a python 'list' lets us work on various problems quite easily, we will often have to stop using it and instead use the 'ndarray' object of numpy for large scale data processing and analysis.

In [25]:
# In the next few cells we will analyze how 'inefficient' the list data structure and the python language is.
# Please don't think that this means that Python is a 'bad' or 'slow' language, instead what we will actually
# learn from this is how seamlessly we can can make Python flexible or performant as necessary.
# 
# The example we will look at here is very basic, we will add 100000 numbers starting from 1 and ending at 100000

In [28]:
# Standard Library to help us benchmark performance
import time
# Third Party Library
import numpy as np

In [30]:
l = list(range(1, 10000001))
nparray = np.array(l)

In [38]:
start = time.perf_counter_ns()

s = 0
for number in l:
    s = s + number

end = time.perf_counter_ns()

print(f"The for loop method runtime: {end - start} nanosec")

for_method = end - start

The for loop method runtime: 1172847300 nanosec


In [39]:
start = time.perf_counter_ns()

sum(l)

end = time.perf_counter_ns()

print(f"The sum method runtime: {end - start} nanosec")

sum_method = end - start

The sum method runtime: 306067100 nanosec


In [40]:
start = time.perf_counter_ns()

np.sum(nparray)

end = time.perf_counter_ns()

print(f"The np.sum method runtime: {end - start} nanosec")

npsum_method = end - start

The np.sum method runtime: 6236000 nanosec


In [42]:
print("The ratio of the benchmarks is: ")
print(f"npsum : sum : for = {1} : {(sum_method/npsum_method):.2f} : {(for_method/npsum_method):.2f}")

The ratio of the benchmarks is: 
npsum : sum : for = 1 : 49.08 : 188.08
