# 1. The File object

> In Python, like everything else Files are also considered objects.

> Just like objects in general, files have attributes and methods provided by the Python Standard Library

> Python provides a function called open() which takes a file name as input and creates a file object


**Usage**

```python
#f is the file object created by the open function

f = open('fileName')

```

**Sample File Object Attributes**

> ```name``` : Returns the name of the file

> ```mode``` : File open mode

> ```closed``` : Returns true is file is closed


**Sample File Object Methods**

> ```read(size)``` : Read the file, size is the number in bytes that can be read

> ```close``` : Close the file

> ```write(string)``` : Write the string to the file

In [None]:
#Create a new file in Jupyter and name it testFile.txt
#To create a new file go to the project page, click New->Text File, type some random text and save as testFile.txt

#Create a file object f
f = open('testFile.txt')

#Print the attribute name of f
print(f.name)

**Full Usage**

```python

#f is the file object created by the open function

f = open('fileName','Mode',Buffering)


```


> fileName : This is the File Name to be opened. Also accepts the full system path of the file ( ex : C:/Users/file.txt)

> Mode : Optional. This tells python the type of operations that will be done on the file. Example : r for reading a file, w for writing to a file. Defaults to r

> Bufferng : Optional, The file is buffered to the given value. If it is 1, every line is buffered, negative value uses system default and anything else is size buffered in bytes.



**Common Modes ( lowercase )**

> r : Read Only

> w : Writing a file only

> rb : read only in binary format

> r+ : read and write

> r+ : write and read. Overwrites if file exists, creates if it doesn’t

> a : Append ( add to the end of file )

> rb+ : read and write in binary format

> wb : write in binary format

In [None]:
#Create a new file in PyCharm(New File not Python File ) and name it testFile.txt

#Create a file object f
f = open('testFile.txt','r+',1)

#Print the attribute name of f
print(f.name)

#Print the attribute mode
print(f.mode)

# 2. Reading Files


> Files have to be opened in a mode which supports reading ( ex : r, r+,w+)

> The file object has a few different methods for reading the content of the file

> ```read()``` : Returns the entire file as a string

> ```read(n)``` : Returns the first n bytes of the file

> ```readline()``` : Returns a single line of the file as a string

> ```readlines()``` : returns all the lines as a list


**Full Usage**

```python

#f is the file object created by the open function
f = open('fileName','Mode')

# Returns the entire file as a string
f.read()

#Returns the first n bytes of the file
f.read(n)

#Returns a single line of the file as a string
f.readline()

# returns all the lines as a list
f.readlines()


```

In [None]:
#Example
#Open the testFile.txt and type a few lines of text and save it

#Open testFile.txt in read only mode
f = open('testFile.txt','r')

#Print the file Name
print(f.name)

#Read and print the entire file content
print(f.read())

In [None]:
#Example
#Open testFile.txt in read only mode
f = open('testFile.txt','r')

#Read and print the 1st line
print(f.readline())

#Read and print the 2nd line
print(f.readline())

#Read and print the 3rd line
print(f.readline())

In [None]:
#Example
#Open testFile.txt in read only mode
f = open('testFile.txt','r')

#Read and print all lines as list
print(f.readlines())

In [None]:
#Example
#Open testFile.txt in read only mode
f = open('testFile.txt','r')

#Python provides an iterator for file which makes it easy to read lines in a loop
for line in f:
    print(line)

# 3. Writing To Files


> Files have to be opened in a mode which supports writing( ex : w, r+,w+)

> The file object has a few different methods for writing a string to the file

> ```write('string')``` : Writes the entire string to the file

> ```writelines(List)``` : Writes the Strings in a List to a file


**Full Usage**

```python

#f is the file object created by the open function
f = open('fileName','Mode')

#Some string which has to written to fileName
fileContent = 'Some String'

stringList = ['String1', 'String2', 'String3']

# Write fileContent to file
f.write(fileContent)

#Write the strings in the list stringList to fileName
f.writelines(stringList)

```

In [None]:
#Example
#Open testFile2.txt in write mode, overwrite if exists, create if doesnt exists
f = open('testFile1.txt','w')

#String to be written in the new file
fileContent = 'This is a test file'

#Write fileContent to the file
f.write(fileContent)

In [None]:
#Example
#Open testFile2.txt in write mode, overwrite if exists, create if doesnt exist
f = open('testFile1.txt','w')

#String to be written in the new file, \n creates a new line
string1 = 'This is 1st line\n'
string2 = 'This is a 2nd line\n'
string3 = 'This is 3rd Line\n'

#Create a list of string1, string2, string3
stringList = [string1,string2,string3]

#Write list of strings in stringList to the file
f.writelines(stringList)

In [None]:
#Example
#Open testFile2.txt in write mode, overwrite if exists, create if doesnt exist
f = open('testFile1.txt','a')

#String to be written in the new file, \n creates a new line
string1 = 'This will be added to the end of the file'

#Write to the file
f.write(string1)


# 4. Other File methods

**Closing Files**

> It’s a good practice to always close a file after you are done working with it

> ```close()``` : The close method of the file object is used to close an open file connection

> After you use the close method on a file object any further attempts to read or write the file will fail

In [None]:
#Example
#Open testFile2.txt in write mode, overwrite if exists, create if doesnt exist
f = open('testFile1.txt','a')

#String to be written in the new file, \n creates a new line
string1 = 'This will be added to the end of the file'

#Write to the file
print(f.write(string1))

#Closing the file connection
f.close()

#The file read will fail with Value Error
print(f.read())

**Seek**

> The seek file method can be used to instruct python to go to a certain position in the file.

> Any subsequent read or write operations will be done starting from the position you specified in the seek method


**Usage**

```python
f.seek(postion, starting_point)

```

> Position : The position to seek to ( in bytes ).

> Starting_point : Optional, This can be 0,1 or 2. 0 to start from the beginning, 1 to use current position and 2 to start from the end. Defaults to 0. The file has to be opened in binary format ( using b ) to be able to seek from the end.

> Use negative numbers to seek to the left(reverse)

In [None]:
#Example
#Open a file for reading
f = open('testFile1.txt','r')

#Go to 10th byte from the start of the file
f.seek(10,0)

#This would print the file from position 10 to the end
print(f.read())

#Go to the 30th byte from the start of the Document
f.seek(30,0)


print(f.read())

**Tell**

> The tell method is used to determine what position in the file python is currently working at

> Any subsequent read or write operations will be done starting from the position you specified in the seek method


**Usage**
```python
f.tell()

```

In [None]:
#Example
#Open a file for reading
f = open('testFile1.txt','rb')

#Go to 10th byte from the start of the file
f.seek(10,0)

#Print the current position, which is 10 after the seek
print(f.tell())

#Go to the 10th byte from the end of the document
f.seek(-10,2)

#This will return the current position, which is total bytes - 10 after the previous seek
print(f.tell())

# 5. Functional Programming

> Object Oriented Programming exclusively uses classes to build programs

> Functional programming is an another way to write code.

> Python supports Functional programming in addition to Object Oriented

> In Functional Programming, the code logic is written around Functions

> The basic characteristic of a function in functional programming is that it does not change data that exists outside of the function. ( i,e Functions are not dependent or coupled with any data outside the functions )

> Functions in functional programming are stateless

**Non-Functional approach**

```python
#Variable a defined outside the function
a = 5

#Function to add two numbers, one of the numbers a is defined outside the function
def sum(b):
  return a+b

#Call the sum function
sum(4)
```

**Functional approach**

```python

def sum(a,b):
  return a+b

#Call the sum function
sum(5,4)
```

# Lambdas

> lambda expressions can be used to write functions instead of the def keyword

> Lambda expression starts with the keyword lambda and immediately declares the input arguments then a colon followed by the return statement without explicitly writing return

> The return statement can only be one expression

> Lambda functions can be assigned and used as a variable

> They are also referred to as anonymous functions

> Lambda functions can be passed as arguments to other functions

**usage**

```Python

lambda inputs : expression

```

In [None]:
#Example of a regular function using def

#Define a function sum to add two numbers a and b
def sum(a,b):
  return a+b

#Call the function Summer
c = sum(4,5)
print(c)

In [None]:
#The same example using Lambda expression

sum = lambda a,b : a+b

c = sum(4,5)
print(c)

# Map, Filter & Reduce

# 6.Map

> Map is a built-in function which accepts one or more iterable data type ( list, tuple etc..) and a function as input arguments/parameters

> It applies the function passed to every element of the given iterable and returns a new list with the new values

> For Example : There is a list of 5 numbers and a function 'a' which takes a number and multiplies it by 10.
> If you pass them both to a map, the function a is applied to every item in the list resulting in a new list with the new values  

**Usage**

```Python
map(function, iterable)
```

In [None]:
#Example
#Defining a list A
listA = [1,2,3,4,5]

print(listA)

#Defining the functionA
def functionA(n):
  return n * 10


#Applying the map function
mapA = map(functionA, listA)

#Converting the map object into a list
newListA = list(mapA)

#Printing the new list
print(newListA)

In [None]:
#Example using Lambda
#Defining a list A
listA = [1,2,3,4,5]

print(listA)

#Using the lambda function in map
mapA = map(lambda x : x*10, listA)

#Converting the map object into a list
newListA = list(mapA)

#Printing the new list
print(newListA)

In [None]:
# Example with multiple lists


#Defining a list A
listA = [1,2,3,4,5]

listB = [1,2,3,4,5]

print('List A : ', listA)

print('List B :', listB)

#Applying the map function
mapA = map(lambda x,y : x*y, listA, listB)

#Converting the map object into a list
newListA = list(mapA)

#Printing the new list
print(newListA)

# Filter

> The Filter function accepts a function and an iterable ( just one unlike map)

> The function which is passed to filter has to return a boolean ( True or False)

> Filter applies the function passed to every item in an iterable and returns a new list with only the items which returned true


**Usage**

```Python
filter(function, iterable)
```

In [None]:
#Example
#Define a list of numbers
listA = [1,21,12,4,11]

#Define a function which returns true if the number passed is greater than 10
def functionA(n):
    return (n > 10)

#Pass function A and listA to the filter
filterA = filter(functionA, listA)

#Convert filter object into a list
newListA = list(filterA)


#Print the new list, only numbers greater than 10 will appear in the result
print(newListA)

In [None]:
#Example uing Lambda
#Define a list of numbers
listA = [1,21,12,4,11]

#Use Lambda function to filter
newListA = list(filter(lambda x:x>10, listA))

#Print the new list, only numbers greater than 10 will appear in the result
print(newListA)

# Reduce

> Reduce accepts a function and a iterable as the input arguments

> It reduces or converts the given iterable(list) into a single value

> The passed function should accept 2 arguments and return a single value

> The function which is passed is applied to the first 2 elements in the given list resulting in a single output, the function is again applied to the output from the previous operation and the next item in the list. This process is repeated until there is only 1 item remaining

> The reduce function is available in the functools built in module. ```from functools import reduce``` imports the function into the current file.


![alt text](https://github.com/soulzcore/iacc_python_2018/raw/master/week3/images/filter.png "Filter Function")



**Usage**

```Python

reduce(function, iterable)

```

In [None]:
#Example
#Import the reduce function from functools module
from functools import reduce

#Define a list of numbers
listA = [1,2,3,4]

#Define a function to add 2 numbers
def functionA(x,y):
    return x+y

#Pass functionA and listA to reduce
reduceA = reduce(functionA,listA)

#Print the result
print(reduceA)

In [None]:
#Example using lambda
#Import the reduce function from functools module
from functools import reduce

#Define a list of numbers
listA = [1,2,3,4]

#Reduce listA using a lambda function
print(reduce(lambda x,y : x+y,listA))

# 7. Data Manipulation & Analysis

# Pandas

> Pandas is a python library or package commonly used for data analysis and manipulation

> It is extremely powerful and flexible

> It is being widely adopted in the Data Engineering and Data Science community

> Pandas add support to different data types, some common types are:
> * Tabular data
> * Time series data
> * Matrices

> Pandas support reading and writing to text, csv, excel, sql databases etc…

> Pandas is extremely fast and optimized for performance

> Pandas is built on top of NumPy – Scientific computing package for Python

> Pandas introduce three new data structures
> * Series – One dimensional arrays ( think lists )
> * Data Frames – 2 dimensional data structure  ( Think tables or excel spreadsheets with rows and columns ).
> * Panels – 3 dimensional data structure

**Installation**

> Install pandas from pypi by running the following command in command prompt or console :

```python
pip install pandas

```

> or in Jupyter by running the command

```python
!pip install pandas

```

# Series

> Series are one dimensional arrays ( ndarrays )

> Ndarrays are multi dimensional data structures defined by numpy – the framework pandas are built on

> Series can be created from lists, dicts , by passing values etc..

> Series consist of an Index and the data

> Index can be pre defined, added later, changed or defaulted to a sequence of numbers

![alt text](https://github.com/soulzcore/iacc_python_2018/raw/master/week4/images/series.png "Pandas Series")

In [None]:
!pip install pandas

In [None]:
#Series Example
#import the pandas module
import pandas as pd

#Create a list of numbers
listOfNumbers = [1,3,5,7,9,0]


#Create a series from the list by applying an index ( by using the series class from the module )
df = pd.Series(listOfNumbers, index=['a','e','i','o','u','z'])

#print the series
print(df)

In [None]:
#Example
#import the pandas module
import pandas as pd

#Create a list of numbers
listOfNumbers = [1,3,5,7,9,0]


#Create a series from the list by applying an index ( by using the series class from the module )
df = pd.Series(listOfNumbers, index=['a','e','i','o','u','z'])

#print the series
print(df)


#Call the sort_values methid to sort the series
print(df.sort_values())

#Print the value with the row index i
print(df['i'])

#Print the value with the row index u
print(df['u'])


**Sample Attributes**

> shape – Returns a tuple of number of rows and columns

> size – Returns the number of items in the Series

> values – Returns the series as an ndarray ( list like )



**Sample Methods**

> describe() – Generates descriptive statistics like count, mean, percentiles etc.

> head(n) – Return first n rows

> tail(n) – Return last n rows

> groupby() – groups values by key

> max() – return max value

> min() – return the minimum value



# Data Frames


> Data Frames are 2 dimensional ndarrays

> Think of it as a table, spreadsheet or a matrix

> Dataframes can be created from multiple series, arrays, dicts, lists, csv files, database tables etc

> Dataframes consist of the data, index and multiple columns

> Index can be pre defined, added later, changed or defaulted to a sequence of numbers



![alt text](https://github.com/soulzcore/iacc_python_2018/raw/master/week4/images/dataframes.png "Dataframes")


![alt text](https://github.com/soulzcore/iacc_python_2018/raw/master/week4/images/dataframes1.png "Dataframes")

In [None]:
#Example

import pandas as pd

#Create a dict of lists, numbers and words are keys
d = {
    'numbers' : [4, 2, 1, 19],
     'words' : ['a', 'f', 'c', 'z']
     }

#Create a dataframe from d
df = pd.DataFrame(d)

#Print the dataframe
print(df)


In [None]:
#Example
import pandas as pd

#Create a dict of lists, numbers and words are keys
d = {
    'numbers' : [4, 2, 1, 19],
     'words' : ['a', 'f', 'c', 'z']
     }

#Create a dataframe from d
df = pd.DataFrame(d)

#Print the dataframe
print(df)

#Sort the dataframe by the words column
print(df.sort_values(by='words'))

#Sort by the numbers column in descending order
print(df.sort_values(by='numbers',ascending=False))

#Print the number or rows and columns
print(df.shape)

#Print descriptive statistics
print(df.describe())

**Sample Attributes**

> shape – Returns a tuple of number of rows and columns

> size – Returns the number of items in the Series

> values – Returns the series as an ndarray ( list like )


**Sample Methods**

> describe() – Generates descriptive statistics like count, mean, percentiles etc.

> head(n) – Return first n rows

> tail(n) – Return last n rows

> groupby() – groups values by key

> max() – return max value

> min() – return the minimum value

# 8. USE CASE – DALLAS CITY CRIME ANALYSIS


**Dataset details**

> The Dallas police incidents is a publicly available dataset which has incident reports from 2014 to 2017

> The Dataset can be downloaded in csv format from : https://www.dallasopendata.com/api/views/qv6i-rri7/rows.csv?accessType=DOWNLOAD

> The Metadata can be found at https://www.dallasopendata.com/Public-Safety/Police-Incidents/qv6i-rri7


**Questions**

> Find the number of rows and columns in the dataset

> Find the top 5 offences reported and their count

> Find the average time taken to send a dispatch after receiving the incident report

> Find the top 5 longest dispatch times ( time between receiving a call and sending out a dispatch )

> Find the 10%, 25%, 50% and 80% percentile of dispatch times

In [None]:
#Download file using urllib
import urllib.request


print('Starting Download.')

url = 'https://www.dallasopendata.com/api/views/qv6i-rri7/rows.csv?accessType=DOWNLOAD'
urllib.request.urlretrieve(url, 'dallas_crime.csv')

print('Download Complete')

In [None]:
#Solution
import pandas as pd

# Create a variable with the csv filename we would like to analyze
filename = 'dallas_crime.csv'

# Create a data frame from the csv file
df = pd.read_csv(filename,header=0, index_col = False)


# Print All columns in the csv file as list
print(df.columns.values.tolist())



# Create a new dataframe df1 from selected columns of interest from the original df
df1 =  df[['Year of Incident','Type of Incident','Call (911) Problem','Type  Location','Type of Property','Incident Address','Zip Code','City','State','Date of Report','UCR Offense Name','Call Received Date Time','Call Dispatch Date Time']]


# Number of columns in the new dataframedf1
print('Number of Columns : ' , len(df1.columns))


# Print All columns as list in df1
print(df1.columns.values.tolist())


# Number of rows in df1
print('Number of rows :' , len(df1))

# Print default metrics of df1
print(df1.describe())

# type function can be used to determine the datatype. It should be dataframe
print(type(df1))



# Find all unique incidents in the dataset
incident_types = df1['UCR Offense Name'].unique()

# Number of Unique type of incidents
print('Number of unique incident types', len(incident_types))


# Create a new dataframe inc_group by Grouping incidents and calculating the number or size of each type of incident, rename the index to count
inc_group =  df1.groupby('UCR Offense Name').size().reset_index(name='count')


# Sort the new dataframe in descending order, highest number incidents first
inc_group = inc_group.sort_values('count', ascending=False)


# Print the top 5 incidents
print('\n\n------ The Top 5 incidents ----- \n\n')
print(inc_group.head(5))


# Create a new column  from the Call Received time column of df1 and convert strings into DateTime object
df1['Call Received Date Time'] = pd.to_datetime(df1['Call Received Date Time'])




# Create a new column from the Call Dispatch time column of df1 and convert strings into DateTime object
df1['Call Dispatch Date Time'] = pd.to_datetime(df1['Call Dispatch Date Time'])


# Calculate time between call received and call dispatched, this will be in days
df1['Time to Dispatch'] = (df1['Call Dispatch Date Time'] - df1['Call Received Date Time'])


# Convert the time to dispatch column into hours, sort and save it as a new series
time_to_dispatch =  df1['Time to Dispatch'].apply(lambda x: x.total_seconds()/60.0 / 60.0).sort_values(ascending=False)


# Print the longest 5 dispatch times
print('\n\n------ The 5 longest dispatch times are ----- \n\n')
print(time_to_dispatch.head(5))


# Print descriptive statics with the top 10%, 25%, 50% and 80% percentiles
print('\n\n------ Descriptive statistics with the top 10, 25, 50 and 80 percentiles ----- \n\n')
print(time_to_dispatch.describe(percentiles = [.10, .25, .50, .80]))