# Chapter 9 - Files and Exceptions
Notebook by: Lindsey Sullivan

- [Files](#Files)
- [Text-File Processing](#Text-File-Processing)
- [Updating Text Files](#Updating-Text-Files) 
- [Serialization with JSON](#Serialization-with-JSON)
- [pickle Serialization and Deserialization](#pickle-Serialization-and-Deserialization)
- [Additional Notes Regarding Files](#Additional-Notes-Regarding-Files)
- [Handling Exceptions](#Handling-Exceptions)
- [finally Clause](#finally-Clause)
- [Explicitly Raising an Exception](#Explicitly-Raising-an-Exception)


## Files

- Python views a **text file** as a sequence of characters and a **binary file** as a sequence of bytes. 
- Similar to lists, the first character in a text file and byte in binary file is located at positon 0.
- each file you **open**, python creates a **file object** that you'll use to interact with the file. 

### End of file
- **End-of0file marker** : denotes the end of a file.

### Standard File Objects
- Python creates 3 **standard file objects**:
 - sys.stdin - standard input file object
 - sys.stdout - standard output file
 - sys.stderr - standard error file

## Text-File Processing

- write(): writes to file, you may also use print()
### with Statement
- aquires resources and assigns its cooresponding object to a variable.
- allows the application to use the resource via that variable. 
- calls the resource objects *close* method to release the resource when program control reaches the end of the *with* statement's suite
### open Function
- **open()** opens the text file and associates with a file object.
- the mode '*w*' opens the file for wirting, creating the file if it does not exist. 
### Writing to the file
- write()
- close(): closes the file

#### Self Check
1. The _*with*__ implicitly releases resources when its suite finishes executing.
2. True/False - it's good practice to keep resources open until your program terminates. 
    *False* - It's good practice to close resources as soon as the program no longer needs them. 
3. Create a grades.txt file and write to it the following three recourds consisting of student IDs, last names and grades:

In [1]:
with open ('grades.txt', mode='w') as grades:
    grades.write('1 Red A\n')
    grades.write('2 Green B\n')
    grades.write('3 White A\n')

### Reading Data from a Text File
- method **readlines** can be used to read an entire textfile
- while reading through a file, the system maintains a **file-position panter** representingthe lcoation of the next character to read.
    - to reposition the file-position pointer you can use the **seek** method
#### Self Check
1. A file object's _#seek*_ method can be used to reposition the file-position pointer.
2. True/False - By default, iterating through a file object with a *for* statement reads one line at a time fromt eh file and returns it as a string. 
3. Read the file grades.txt that you created in the previous self check and display it in columns with the columns heads 'ID', 'Name' and 'Grade'.

In [4]:
with open('grades.txt','r') as grades:
    print(f'{"ID":<4}{"Name":<7}{"Grade"}')
    for record in grades:
        student_id, name, grade = record.split()
        print(f'{student_id:<4}{name:<7}{grade}')

ID  Name   Grade
1   Red    A
2   Green  B
3   White  A


## Updating Text Files
Updating accounts.txt with the example below - uses a with statement to update thea ccounts.txt file to change account 300's name from 'White' to 'Williams' as described above.

In [6]:
with open('accounts.txt', mode='w') as accounts:
    accounts.write('100 Jones 24.98 \n')
    accounts.write('200 Doe 345.67\n')
    accounts.write('300 White 0.00\n')
    accounts.write('400 Stone -42.16\n')
    accounts.write('500 Rich 224.62\n')

In [12]:
accounts = open('accounts.txt','r')
temp_file = open('temp_file.txt','w')
with accounts, temp_file:
    for record in accounts:
        account, name, balance = record.split()
        if account != '300':
            temp_file.write(record)
        else:
            new_record = ' '.join([account, 'Williams',balance])
            temp_file.write(new_record + '\n')

### os Module File-Processing Functions
- *os module* provides functions of interacting with the operating system.
- *remove function* to delete the original file
- *rename function* to rename the temporary file

#### Self Check
1. The os module's _*remove*_ and _*rename*_ functions delete a file and specify a new name for a file,respectively.
2. True/False - Formatted data in a text fiel can be updated in place because records and their fields are fixed in size. 
    - False. Such data cannot be modified withou the risk of destroying other data in the file, because records and their fields can vary in size. 
3. In the accounts.txt file, update hte last name 'doe' to 'Smith'

In [13]:
accounts = open('accounts.txt','r')
temp_file = open('temp_file.txt','w')
with accounts, temp_file:
    for record in accounts:
        account, name, balance = record.split()
        if name != 'Doe':
            temp_file.write(record)
        else:
            new_record=' '.join([account, 'Smith', balance])
            temp_file.write(new_record + '\n')

## Serialization with JSON
- JSON objects are similar to Python dictionaries. JSON object contains a comma-seperated list of property names and values in curly braces.
- JSON also supports arrays
- The **JSON Module** enables you to convert objects to JSON text format aka **seralizing** the dat. 


In [None]:
accounts_dict = {'accounts' : [{'account': 100, 'name' : 'Jones', 'balance' : 24.98},{'account':200, 'name': 'Doe', 'balance' : 345.67}]}

In [None]:
import json

with open('accounts.json','w') as accounts:
    json.dump(accounts_dict, accounts)

- JSON module **dump()** seralizes the dictionary account_dict into the file
- JSON module **load()** reads the entire JSON contents of its file object argument and converts the JSON into python object aka **de-seralizing** the data.
- **dumps** returns a Python string representation of an object in JSON format
#### Self Check
1. Coverting objects to JSON text format is known as _*seraliziation*_ and reconstructing the original Python object fromt he JSON text is known as _*deseralization*_.
2. True/False - JSON is both a human-readable and computer-readable format that makes it convenient to tsend and recive objects across the Internet.
    - **True**
3. Create a JSON file named grades.json and write into it the following dictionary

In [None]:
import json
grades_dict = {'gradebook': [{'student_id': 1, 'name': 'Red', 'grade': 'A'}, {'student_id': 2, 'name' : 'Green', 'grade': 'B'}, {'student_id': 3, 'name': 'White', 'grade':'A'}]}

with open('grades.json','w') as grades: 
    json.dump(grades_dict, grades)
with open('grades.json','r') as grades:
    print(json.dumps(json.load(grades), indent=4))

## pickle Serialization and Deserialization
- **pickle module** can seralize objects into a python-specific data format. 
    - picke files can be hacked. If you receive a raw pickle file over the network, do not trust it. It could have malicious code in it that would run arbitrary Python when you try to de-pickle it. 
    - Do not recommend using pickle but it has been used for many years, so you're likely to encounter it in legacy code. 

## Additional Notes Regarding Files
- 'r' : open a text for reading
- 'w' : open a text file for writing, existing file contents are deleted. 
- 'a' : open a text file for appending at the end, creating the file if it does not exist. New data is written at the end of the file. 
- 'r+' : open a text file reading and writing
- 'w+' : open a text file reading and writing. Existing file ocntents are deleted.
- 'a+' : open a text file reading and appending at the end. New data is written at the end of the file. If the file does not exist itis created. 

## Handling Exceptions

- **FileNotFoundError** occurs if you attempt to open a non-existent file for reading. 
- **PermissionsError** occurs if you attempt an operation for which you do not have permission. 
- ValueError occurs when you attempt to write to a file that has already been closed. 
- When an error is raised it is said to **raise an exception**

### try Statements
- try Clause
- try statements to enable exception handling.
    - **try Clause** : a suite statement that *might* raise exceptions
### except Clause
- try clause may be followed by one or more **except clauses** that immediately follow the try clause's suite. Known as exception handlers.
### else Clause
- option **else clause** specifies code that should execute only if the code in the try suite did not raise exceptions. 
- point at which an exceoption occurs is called a **raise point**
- catching multiple exceptions in one except clause
- except(type1,type2,...) as *variable_name*;

#### Self Check
1. The statement that raises an exception is sometimes called the _*raise point*_ of the exception
2. True/False - In Python, it's possible to return to the raise point of an exception via keyword return. 
    - **False** Program control continues from the first statement after the try statement in which the exception was handled. 
3. Before executing the IPython session, determine what the following function displays if you call it iwith a value of 10.7 then the value 'Python'?

In [None]:
def try_it(value):
    try:
        x = int(value)
    except ValueError:
        print(f'{value} could not be converted to an integer')
        else:
            print(f'int({value}) is {x}')

try_it(10.7)
try_it('Python')

## finally Clause

- closing the file help prevents a **resource leak**
- The **finally clause** is guaranteed to execute *regardless* of whether its try suite executes successfully or an exception occurs.

#### Self Check
1. True/False - If a finally clause appears in a function, that finally clause is guarnateed to execute when the function executes, regardless of whether the function raises an exception. 
    - **False** The finally clause will execute only if program control enters the corresponding try suite.
2. Closing a file helps prevent a _*resource leak*_ in which the file resource is not available to other programs because a program using the file never closes it. 

## Explicitly Raising an Exception
- the **raise statement** explicitly raises an exception.

#### Self Check
1. Use the _*raise*_ statement to indicate that a problem occurred at execution time. 

# Chapter 17 - Big Data: Hadoop, Spark, NoSQL and IoT
- [Databases](#Databases)
- [Relational Databases and Structured Query Language](#Relational-Databases-and-Structured-Query-Language)
- [Viewing Contents](#Viewing-Contents)
- [SQL Keywords](#SQL-Keywords)


## Databases

- **Relational databases** - store **structure data** in tables with a fixed-size number of columns per row
    - manipulated with **SQL**
- Most data today is **unstructured data** like Facebook posts and tweets, or **semi-structured** like JSON and XML documents. 
- Big data is handled by new databases:
    - **NoSQL Databases** - key, value, document, columnar and graph databooks.

#### Self Check
1. _*Relational*_ databases store structured data in tables with a fixed-size number of columns per row and manipulated via Structure Query Language. 
2. Most data produced today is _*unstructured*_ data, like the content of Facebook posts and Twitter tweets, or _*semi-structured*_ data like JSON and SML documents. 
3. Cloud vendors focus on _*service-oriented architecture (SOA)*_ technology in which they provide "as-a-Service" capabilities that applications connect to and use in the cloud. 

## Relational Databases and Structured Query Language

- SQL is used almost universally with relational database systems to manipulate data and perform queries, which request information that satisfies given criteria.
- Tables are composed of **rows**, each describing a single entity. Rows are composed of **columns** containing individual attribute values. **Primary key** - a unique value for each row.

#### Self Check
1. A table in a relational database consists of _*rows*_ and _*columns*_.
2. The _*primary*_ key uniquely identifies each record in a table. 
3. True/False - Python's Database Application Programming Interface (DB-API) specifies common object and method names for manipulating any database. 
    - **True**

### Using SQLite3 module to seetup a relational database in sqlite. 
- **CRUD** operations: create, read, update, and delete. 
- Connecting to the Database in Python
    import sqlite3
    connection = sqlite3.connect('books.db')
- To work in python you must connect via **contect function** and **connection object**

### Viewing Contents

In [3]:
import sqlite3

connection = sqlite3.connect('books.db')

import pandas as pd

pd.options.display.max_columns = 10

pd.read_sql('SELECT * FROM authors', connection, index_col=['id'])

- Pandas function **read_sql** executes a SQL query and returns a DataFrame containing the query's results. 
- **SELECT** query gets rows and columns from one or more tables
- **(*) wildcard** = all
- **FROM** specifies where

- Every **foreign-key** (column that matches the primary in another table) must appear as a primary-key in a row of another table is the **rule of referential integrity**
- Foreign keys also allow related data in multiple tables to be selected and combined known as **joinin**.
    - Primary -> Foregin have a **one to many relationship**.
- Through the creation of multiple tables it can allow **many-to-many relationship**. 

### SQL Keywords
- SELECT
    - Retreives data from one or many tables
- FROM
    - Specifies tables, required in every SELECT
- WHERE
    - Criteria for selection - optional
- GROUP BY
    - Criteria for grouping - optional
- ORDER BY
    - Criteria for ordering rows - optional
- INNER JOIN
    - Merge rows from multiple tables
- INSERT
    - Insert rows into specified table
- UPDATE
    - Update rows in specified table
- DELETE
    - Delete rows froma  specified table
- LIKE
    - Used for pattern matching (clause for WHERE)
- ON
    - Uses primary key and foreign key to determine rows to merge from each table (join clause)
- NOT
    - Reverses WHERE

#### Self Check
1. SQL keyword _*WHERE*_ is followed by the selection criteria that specify the records to select in a query. 
2. SQL keyword _*ORDER BY*_ specifies the order in which records are sorted in a query. 
3. A _*qualified name*_ specifies the fields from multiple tables that should be compared to join the tables. 
