# Sets, Tuples, Functions, and Files

## Sets
A set is an "unordered collection of unique and immutable eobjects that supports operations corresponding to mathematical set theory."  I don't know anything about mathematical set theory, so I'll show you how I use sets in Python.  Perhaps one of you mathmaticians can explain set theory?

In [None]:
# Note that a set looks like a dictionary with only keys (no values)
x = {1, 2, 3, 4}
x

In [None]:
x = {2, 2, 1, 1, 5, 4, 5, 4, 1, 5}
x
# Similar to a dictionary in that if you assign a new value to an existing key
# it doesn't replicate the key, it replaces the value.

In [None]:
# Useful for removing duplicates from a list
a = [1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
a

In [None]:
a = list(set(a))
a

In [None]:
b = ["a", "one", "a", "one", "b"]
b

In [None]:
b = list(set(b))
b

In [None]:
# Comprehensions with sets
{x**2 for x in [1,1,2,2,3,3,4,4,5,5]}

In [None]:
# compare to this version
[x**2 for x in {1,1,2,2,3,3,4,5,5}]

In [None]:
# compare the same comprehension with a list rather than a set
[x**2 for x in [1,1,2,2,3,3,4,4,5,5]]

In [None]:
# Creating a list of the unique words in a string
s = "This one sentence has multiple instances of one and has multiple instances of other words also."
s

In [None]:
# All the words
s.replace('.','').split()

In [None]:
# Unique words
list(set(s.replace('.','').split()))

## Tuples
A tuple is like a list, except that it's immutable.  We've seen tuples as the return value in interactive Python.

In [None]:
# create a tuple, t
t = (1, 2, 3, 'dog')
# create an anonyumous tuple containing 3 tuples and an integers
t, t[0], t[:3], t[2:]

In [None]:
# Unlike lists, tuples are immutable
t[3] = 'cat'
# This can make them useful if you want to guarantee that the values can't be changed

In [None]:
# Same thing using a list
t = [1, 2, 3, 'dog']
t[3] = 'cat'
t

## Functions

A funcation accepts an optional list of arguements, does something, and optionally returns a list of objects.

In [None]:
#  Define a couple of functions -- just for kicks (demonstration, that is) -- would use NumPy for real stuff. 
#
# mean() - compute the sample mean.
#     Parameters:
#         N a list of numbers
#
def mean( N ):
    # running total
    Total = 0
    # count of the number of items
    Count = len(N)
    # for each item in the list
    for Num in N:
        # increment the total
        Total = Total + Num
    # compute the sample average
    average = float(Total)/Count if Count > 0 else 0
    return(average)

#
# std_dev() - compute the sample standard deviation.
#     Parameters:
#         N a list of numbers
#
def std_dev( N ):
    Count = len(N)
    # Compute the average
    average = mean(N)
    if Count > 1:
        # Compute the std dev.
        Total = 0
        for Num in N:
            Total = Total + (float(Num) - average)**2
        std_dev = ((float(1)/(Count-1))*Total)**(float(1)/2)
    else:
        std_dev = 0
    return(std_dev)

In [None]:
l = [1, 2, 3, 4, 5, 6]
mean(l), std_dev(l)

In [None]:
# Human-friendly
l = [1, 2, 3, 4, 5, 6]
print("The average is {:.2f} and the standard deviation is {:.2f}".format(mean(l), std_dev(l)))

# NumPy versions
import numpy as np
print("NumPy: The average is {:.2f} and the standard deviation is {:.2f}".format(np.mean(l), np.std(l)))
# Oops -- why aren't the std dev the same?  Consider using the Help (np.std?)

In [None]:
np.std?

In [None]:
# Need to specify the delta dof parameter to be 1 if you want a sample
# standard deviation rather than a population standard deviation
np.std(l, ddof=1)

## Files

Reading and writing files is very important in data science applications.

In [None]:
# Before executing this, have a look at the data file in a text editor.
#
# Make sure that you uderstand the file path notation -- ../data means "go up one level, then go to
# the data directory from that location" - this is a relative path.
f = open('../data/data.txt', 'r')
print (f, type(f))
f.close()
# Note here that we open the file, do what we want, and then close the file.  If we
# want to do more later, we'll have to reopen the file.

In [None]:
# Read one line from the file
f = open('../data/data.txt', 'r')
line = f.readline()
f.close()
line, type(line)

In [None]:
# '\n' is the newline character that signifies the end of a line
# in a text file.  To remove it, use strip()
f = open('../data/data.txt', 'r')
line = f.readline().strip()
f.close()
line, type(line)
# Note that we have chained the readline() and strip() functions.

In [None]:
# If we want integers
f = open('../data/data.txt', 'r')
line = int(f.readline().strip())
f.close()
line, type(line)
# If you want floats instead, use float()

In [None]:
# -------------------------------------
# reads a file consisting of 
#   integers (1 integer on each line) and
#   stores the integers in a list
# -------------------------------------
# Open the file
f = open('../data/data.txt', 'r')
# Setup the list for the values
vals = []
# Priming read
l = f.readline()
# Loop until l is empty (end of file behavior)
while (l) :
    # Make sure it's not a blank line
    if l.rstrip() :
        # Strip the newline and convert to integer
        vals.append(int(l))
    # read the next line
    l = f.readline()
# close the file
f.close()

In [None]:
vals

In [None]:
# same thing with a single comprehension
vals1 = [int(i.rstrip()) for i in open('../data/data.txt','r') if i.rstrip()]
vals == vals1

In [None]:
# Show the mean and std dev.
print("Mean: {:.3f}".format(mean(vals)))
print("Std. Dev.: {:.3f}".format(std_dev(vals)))

In [None]:
# Write the values to a new file with the values separated by commas (a single line)
f = open('../data/data1.txt', 'w')
for j in range(len(vals)):
    # need the comma for all but the last one
    if j < len(vals)-1:
        f.write("{:},".format(vals[j]))
    else :
        f.write("{:}".format(vals[j]))
f.close()
# Be sure to have a look at the file that you just wrote.

In [None]:
# same outcome using a string join.
f = open('../data/data1.txt', 'w')
f.write(",".join([str(v) for v in vals]))
f.close()

In [None]:
# read and show the new file
vals1 = [i.rstrip() for i in open('../data/data1.txt','r') if i.rstrip()]
vals1

In [None]:
# What about csv files?
#  See https://docs.python.org/3/library/csv.html for more details
import csv
with open('../data/reg_sample.csv') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    for row in reader:
        # see the row objects returned from the reader iterator
        print(row)
        # see a more user-friendly version
        #print(', '.join(row))

In [None]:
# Or to create a list of lists from the csv file:
with open('../data/reg_sample.csv') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    vals = [row for row in reader]

In [None]:
# Now you can do whatever you want with the data
vals