## list and set - Usage

Let us see some real world usage of list and set while building Python based applications.* `list` is used more often than `set`.
  * Reading data from file into a `list`
  * Reading data from a table into a `list`
* We can convert a `list` to `set` to perform these operations.
  * Get unique elements from the `list`
  * Perform `set` operations between 2 lists such as union, intersection, difference etc.
* We can convert a `set` to `list` to perform these operations.
  * Reverse the collection
  * Append multiple collections to create new collections while retaining duplicates
* You will see some of these in action as we get into other related topics down the line

In [1]:
%%sh

ls -ltr /data/retail_db/orders/part-00000

-rw-r--r-- 1 root root 2999944 Jan 21  2021 /data/retail_db/orders/part-00000


In [2]:
# Reading data from file into a list
path = '/data/retail_db/orders/part-00000'
# C:\\users\\itversity\\Research
orders_file = open(path)

In [3]:
orders_raw = orders_file.read()

In [4]:
orders = orders_raw.splitlines()

In [5]:
orders[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [6]:
len(orders) # same as number of records in the file

68883

In [7]:
# Get unique dates
dates = ['2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0']

In [8]:
dates

['2013-07-25 00:00:00.0',
 '2013-07-25 00:00:00.0',
 '2013-07-26 00:00:00.0',
 '2014-01-25 00:00:00.0']

In [9]:
len(dates)

4

In [10]:
set(dates)

{'2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0'}

In [11]:
len(dates)

4

In [12]:
# Creating new collection retaining duplicates using 2 sets
s1 = {'2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0'}

In [13]:
s2 = {'2013-08-25 00:00:00.0', '2013-08-26 00:00:00.0', '2014-01-25 00:00:00.0'}

In [14]:
s1.union(s2)

{'2013-07-25 00:00:00.0',
 '2013-07-26 00:00:00.0',
 '2013-08-25 00:00:00.0',
 '2013-08-26 00:00:00.0',
 '2014-01-25 00:00:00.0'}

In [15]:
len(s1.union(s2))

5

In [16]:
s = list(s1) + list(s2)

In [17]:
s

['2014-01-25 00:00:00.0',
 '2013-07-25 00:00:00.0',
 '2013-07-26 00:00:00.0',
 '2014-01-25 00:00:00.0',
 '2013-08-26 00:00:00.0',
 '2013-08-25 00:00:00.0']

In [18]:
len(s)

6

In [21]:
s=set(s)
list(s)

['2013-07-25 00:00:00.0',
 '2013-08-26 00:00:00.0',
 '2014-01-25 00:00:00.0',
 '2013-07-26 00:00:00.0',
 '2013-08-25 00:00:00.0']