<h1><u>Python list basics</u></h1>
<p>Objective : To summarize and practice basics of handling lists in python using core methods.
<p style="color:#666666">Last updated: 12th Jul 2017<br>Akshay Sehgal, www.asehgal.com</p>

Unlike C++ which works on arrays, Python works on the concept of lists. In general - 
- Array is a fixed data structure that takes the same time to read or write time into.
- List is a dynamic length data structure which is fast reading, slow writing.
- Linked Lists are a third type of data structure (which wont be discussed here) but are also dynamic, and instead allow fast writing and slow reading.

If Python, both list and arrays have their implementation but the primary one used is lists. Python defines a set of methods to work with lists easily. These will be the core parts of discussion here.

Lets define a list first.

In [6]:
mylist = [1,1,2,3,3,4,4,4,1,1,3,2,1,6,6,5]

One basic objective that every coder comes across is to count the frequencies of the various elements in the list. This can be done using the count method.

In [7]:
mylist.count(4)

3

In [8]:
##A more generic implementation

#Define a dictionary
y={}

for i in mylist:
    y[i]=mylist.count(i)
print(y)

{1: 5, 2: 2, 3: 3, 4: 3, 6: 2, 5: 1}


In [9]:
#One line approach to the same
from collections import Counter
print(dict(Counter(mylist)))

{1: 5, 2: 2, 3: 3, 4: 3, 6: 2, 5: 1}


The second most generic task a coder has to do is applying a particular function to each element of a given list. For this example lets say we want to square each element of the list x. Following is how MAP method is used to achieve this.

In [11]:
a = list(map(lambda x:x**2,mylist))
print(a)

[1, 1, 4, 9, 9, 16, 16, 16, 1, 1, 9, 4, 1, 36, 36, 25]


Working with lists is specially useful when handling data in text format, as in natural language processing. Lets say we have piece of text we want to work with.

In [19]:
text = "Akshay is a data scientist from ipredictt. Akshay enjoys working on python. Akshay is also a guitarist."

#Finding the first and last instance (character) where Akshay occurs.
print(text.index("Akshay"))
print(text.rindex("Akshay"))

0
76


Lets try to split the list by spaces first, and then count the frequency of each word. This is also known as a CountVectorizor

In [22]:
#Logical way to do this is use split function to split the string. By default it splits by SPACE

text_tokens = text.split()
print(text_tokens)
print(dict(Counter(text_tokens)))

['Akshay', 'is', 'a', 'data', 'scientist', 'from', 'ipredictt.', 'Akshay', 'enjoys', 'working', 'on', 'python.', 'Akshay', 'is', 'also', 'a', 'guitarist.']
{'Akshay': 3, 'is': 2, 'a': 2, 'data': 1, 'scientist': 1, 'from': 1, 'ipredictt.': 1, 'enjoys': 1, 'working': 1, 'on': 1, 'python.': 1, 'also': 1, 'guitarist.': 1}


In [23]:
# We can also split the text by sentences. Just mention the condition by which to split in the parameters of the split method.

text_sents = text.split(".")
print(text_sents)

['Akshay is a data scientist from ipredictt', ' Akshay enjoys working on python', ' Akshay is also a guitarist', '']


Notice that the last element of the text_sents has a blank value. Sometimes we need to use some methods to clean the given list of tokens before we can start working with them. Again, lets try identifying them with the MAP function as before.

In [25]:
print(list(map(lambda x:x!='',text_sents)))

[True, True, True, False]


This shows that the last list element is the one we want to remove, but it only identifies it. This is where the FILTER methods comes into play. Its exactly the same as a MAP function, but it automatically applies the boolen result of a MAP function to the give list at hand.

In [26]:
print(list(filter(lambda x:x!='',text_sents)))

['Akshay is a data scientist from ipredictt', ' Akshay enjoys working on python', ' Akshay is also a guitarist']


# References - 
-  https://www.quora.com/What-is-the-difference-between-an-array-a-list-and-a-linked-list