---
<center><h1> Lesson 1 - Crash course into Python</h1></center>
---
---

<center><h1>Part 5. Import and Work with Files</h1></center>

---

## Table of Contents
- [Importing Libraries and Modules](#Importing-Libraries-and-Modules)
    * [Creation of Your Own Modules and Packages](#Creation-of-Your-Own-Modules-and-Packages)
- [Files and Printing](#Files-and-Printing)
    - [*Exercise 5.1*](#Exercise-5.1)

---
# Importing Libraries and Modules

One of the greatest strengths of the python programming language is its rich set of libraries- pre-written code that implements a variety of functionality. For the data scientist, python's libraries (also called "modules") are particularly valuable. With a little bit of research into the details of python's libraries, a lot of common data  tasks are little more than a function call away. Libraries exist for doing data cleaning, analysis, visualization, machine learning and statistics. 

[This XKCD cartoon](http://xkcd.com/353/) pretty much summarizes what Python libraries can do...

In order to have access to a libraries functionality in a block of code, you must first import it. Importing a library tells python that while executing your code, it should not only consider the code and functions that you have written, but code and functions in the libraries that you have imported.

There are several ways to import modules in python, some have ebetter properties than others. Below we see the preferred general way to import modules. In documentation, you may see other ways to import libraries (`from a_library import foo`). There is no risk to just copying this pattern if it is known to work. 

Imagine I want to import a library called `some_python_library`. This can be done using the import commands. All code below that import statement has access to the library contents.

+ `import some_python_library`: imports the module `some_python_library`, and creates a reference to that module in the current namespace. Or in other words, after you’ve run this statement, you can use `some_python_library.name` to refer to things defined in module `some_python_library`

+ `import some_python_library as plib`: imports the module `some_python_library` and sets an alias for that library that may be easier to refer to. To refer to a thing defined in the library `some_python_library`, use `plib.name`

+ `from some_python_library import sub_library`: imports the module `sub_library` from the package of libraries `some_python_library`

+ `import library_1, library_2, library_3, ...`: there is possibility of import of many modules in one line

+ `from some_python_library import *`: import all names from a module into the current namespace

In practice you'll see the second pattern used very frequently; `pandas` referred to as `pd`, `numpy` referred to as `np`, etc. 

Let's give a few examples of importing of some common used libraries:

* [`math`](https://docs.python.org/2/library/math.html) contains many mathematical operations

In [1]:
import math

number = 2
print (math.sqrt(number))
print (math.log(number))
print (math.factorial(10))

1.4142135623730951
0.6931471805599453
3628800


* [`datetime`](https://docs.python.org/2/library/datetime.html) contains functions and classes for working with dates and times, separatley and together

In [2]:
from datetime import datetime as d

now = d.now()
print (now)
print ('year:', now.year)
print ('month:', now.month)
print ('day:', now.day)
print ('week day:', now.weekday())
print ('hour:', now.hour)
print ('minute:', now.minute)
print ('second:', now.second)
print ('microsecond:', now.microsecond)

print ('Today  :', d.today())
print ('UTC Now:', d.utcnow())

2019-04-23 13:20:59.565750
year: 2019
month: 4
day: 23
week day: 1
hour: 13
minute: 20
second: 59
microsecond: 565750
Today  : 2019-04-23 13:20:59.566483
UTC Now: 2019-04-23 10:20:59.566533


* [`os`](https://docs.python.org/2/library/os.html) provides a portable way of using operating system dependent functionality

In [3]:
import os

# prints all files and folders in the root directory
for file in os.listdir("./"):
    print (file)
    
print ()
# builds OS paths
for parts in [ ('one', 'two', 'three'),
               ('/', 'one', 'two', 'three'),
               ('/one', '/two', '/three'),
              ]:
    print (parts, ':', os.path.join(*parts))

Lesson 1. Part 4. User Defined Functions.ipynb
Lesson 2. Part 3. Work with pandas DataFrames - main operations, sorting and selecting by type.ipynb
Lesson 2. Part 1. Introduction to pandas data structures.ipynb
Lesson 6. Part 2. SQL syntax and basic commands.ipynb
Lesson 6. Part 1. What is SQL - Connection to SQL servers.ipynb
Lesson 2. Part 6. Work with pandas DataFrames - reshaping and pivot tables.ipynb
Lesson 8.2. NoSQL with Python - Mongo.ipynb
Lesson 1. Part 5. Import and Work with Files.ipynb
Lesson 5. Advanced topics with sklearn.ipynb
Vagrantfile
Lesson 2. Part 4. Work with pandas DataFrames - grouping.ipynb
Lesson 3.1. Part 1. NumPy.ipynb
Lesson 7.2. NoSQL with Python - Neo4j.ipynb
Lesson 8.1. Work with Twitter API in Python.ipynb
Lesson 2. Part 5. Work with pandas DataFramesb - join, merge and concatenate.ipynb
test
Lesson 4. Part 2. Machine learning general overview. Classification.ipynb
Lesson 1. Part 7. Basic Object Oriented Programming in Python.ipynb
Lesson 1. Part 1. P

The list of Standard Python Libraries can be found [here](https://docs.python.org/2/library/)

### Creation of Your Own Modules and Packages

[[back to top]](#Table-of-Contents)

Creating Python modules is something that most Python programmers do every day. Any time you save a new Python script, you have created a new module. You can import your module into other modules. A package is a collection of modules. The things you import into your scripts from the standard library are modules or packages. In this article, we’ll be looking at how to create modules and packages. We’ll spend more time on packages since they’re more complicated than modules.

We will begin by creating a super simple module. This module will provide us with basic arithmetic and no error handling. 

Note, you need run all above cells of this section for testing its work
Here’s our first example:

In [4]:
def add(x, y):
    return x + y
 
def division(x, y):
    return x / y
 
def multiply(x, y):
    return x * y
 
def subtract(x, y):
    return x - y

In [5]:
%save my_math.py _ih[5]
# !!! Pay your attention: the number between brackets [] should coincide with x from the above In [x]

The following commands were written to file `my_math.py`:
get_ipython().run_line_magic('save', 'my_math.py _ih[5]')
# !!! Pay your attention: the number between brackets [] should coincide with x from the above In [x]


Above code created a file `my_math.py` in the roor folder. Please check whether it is.

Let's write a little script that imports our module and runs the functions in it:

In [7]:
import my_math as m
print ("Add 5 and 8")
print (m.add(5, 8))
print ("Substract 10 and 5")
print (m.subtract(10, 5))
print ("Divide 2 by 7")
print (m.division(2, 7))
print ("Multiply 12 by 6")
print (m.multiply(12, 6))

Add 5 and 8


AttributeError: module 'my_math' has no attribute 'add'

The main difference between a module and a package is that a package is a collection of modules AND it has an `__init__.py` file. Depending on the complexity of the package, it may have more than one `__init__.py`. Let’s take a look at a simple folder structure to make this more obvious, then we’ll create some simple code to follow that structure.

Let's create a new folder "arithmetic", which we'll collect our modules in 

In [8]:
os.mkdir('./arithmetic')

In [9]:
def add(x, y):
    return x + y

In [10]:
%save arithmetic/addition.py _ih[9]

The following commands were written to file `arithmetic/addition.py`:
def add(x, y):
    return x + y


In [11]:
def division(x, y):
    return x / y

In [12]:
%save arithmetic/division.py _ih[11]

The following commands were written to file `arithmetic/division.py`:
def division(x, y):
    return x / y


In [13]:
def multiply(x, y):
    return x * y

In [14]:
%save arithmetic/multiplication.py _ih[13]

The following commands were written to file `arithmetic/multiplication.py`:
def multiply(x, y):
    return x * y


In [15]:
def subtract(x, y):
    return x - y

In [16]:
%save arithmetic/subtraction.py _ih[15]

The following commands were written to file `arithmetic/subtraction.py`:
def subtract(x, y):
    return x - y


We should also create an empty Python file `__init__.py`. The below command we will consider in the next section.

In [17]:
open('arithmetic/__init__.py','a').close()

"arithmetic" folder (your own package) has the following structure

    arithmetic/
        __init__.py
        addition.py
        division.py
        multiplication.py
        subtraction.py

In [18]:
from arithmetic import addition, division, multiplication, subtraction
print ("Import functions from package")
print ("Add 5 and 8")
print (addition.add(5, 8))
print ("Substract 10 and 5")
print (subtraction.subtract(10, 5))
print ("Divide 2 by 7")
print (division.division(2, 7))
print ("Multiply 12 by 6")
print (multiplication.multiply(12, 6))

Import functions from package
Add 5 and 8
13
Substract 10 and 5
5
Divide 2 by 7
0.2857142857142857
Multiply 12 by 6
72


---
## Files and Printing

[[back to top]](#Table-of-Contents)

You'll often be reading data from a file, or writing the output of your python scripts back into a file. Python makes this very easy. You need to open a file in the appropriate mode, using the `open` function, then you can read or write to accomplish your task. The `open` function takes two arguments, the name of the file, and the mode. The mode is a single letter string that specifies if you're going to be reading from a file, writing to a file, or appending to the end of an existing file. The function returns a file object that performs the various tasks you'll be performing: `a_file = open(filename, mode)`. The modes are:

+ `'r'`: open a file for reading
+ `'w'`: open a file for writing. Caution: this will overwrite any previously existing file
+ `'a'`: append. Write to the end of a file. 

When reading, you typically want to iterate through the lines in a file using a for loop, as above. Some other common methods for dealing with files are: 

+ `file.read()`: read the entire contents of a file into a string
+ `file.readline()`: read one line of a file
+ `file.readlines()`: read all lines of a file and collect them as a list
+ `file.write(some_string)`: writes to the file, note this doesn't automatically include any new lines. Also note that sometimes writes are buffered- python will wait until you have several writes pending, and perform them all at once
+ `file.flush()`: write out any buffered writes
+ `file.close()`: close the open file. This will free up some computer resources occupied by keeping a file open.
+ `file.seek(position)`: moves to a specific position within a file. Note that position is specified in bytes. 

Here is an example using files:

In [19]:
print ("Open file")
A_file = open("temp.txt", "w")
A_list = ["a", "b", "c", "d"]
A_set = {1, 2, 3, 4}
print ("write rows to file")
for x in A_list:
    A_file.write("letter: %s\n" % x)
    # print "letter: %s\n" % x
for n in A_set:
    A_file.write("number: %d\n" % n)
    # print "number: %d\n" % n
print ("Flush data and close file")
A_file.flush()
A_file.close()

Open file
write rows to file
Flush data and close file


In [20]:
print ("Print file data through bash")
!cat temp.txt

Print file data through bash
letter: a
letter: b
letter: c
letter: d
number: 1
number: 2
number: 3
number: 4


In [21]:
print ("Print file data through python")
file_2 = open("temp.txt", "r")
for line in file_2:
    print (line) # note that this doesn't strip off the newlines
file_2.close()

Print file data through python
letter: a

letter: b

letter: c

letter: d

number: 1

number: 2

number: 3

number: 4



Another way of working with file objects is the `with` statement. It is good practice to use this statement. 
With the `with` statement, you get better syntax and exceptions handling. 

In addition, it will automatically close the file. The `with` statement provides a way for ensuring that a clean-up is always used.

In [22]:
# write a new file
with open("hello.txt", "w") as f:
    f.write("Hello World\nIt is a  new line\nAnd again a line\nThe end")
    
# read the "hello.txt" file
with open("hello.txt") as f:
    data = f.readlines()
    for line in data:
        print (line)

Hello World

It is a  new line

And again a line

The end


In [23]:
file_3 = open("temp.txt", "r")
content = file_3.read()
print (content)
file_3.close()

letter: a
letter: b
letter: c
letter: d
number: 1
number: 2
number: 3
number: 4



In [24]:
# filter rows
file_4 = open("temp.txt", "r")
for line in file_4:
    if line.count("t") > 0:
        break
    print (line.strip()) # remove the extra newline.
file_4.close()

In [25]:
!cat temp.txt

letter: a
letter: b
letter: c
letter: d
number: 1
number: 2
number: 3
number: 4


In [26]:
# filter columns
file_5 = open("temp.txt", "r")
for line in file_5:
    columns = line.strip().split(": ") # create a list by splitting the line on the " " and ":" characters
    print ("_".join(columns)) # prints the columns as a string, using the "#" char as a separator
    if columns[1] != "b": # if the second element of the list is NOT b, 
        print (columns) # then print the list

file_5.close()

letter_a
['letter', 'a']
letter_b
letter_c
['letter', 'c']
letter_d
['letter', 'd']
number_1
['number', '1']
number_2
['number', '2']
number_3
['number', '3']
number_4
['number', '4']


Common used file formats for storage of large numbers of rows with data are [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) and [JSON](https://en.wikipedia.org/wiki/JSON) file formates. 

Python provide libraries for writting, reading and updating these (and many others) files:
* [`csv`](https://docs.python.org/2/library/csv.html) library implements tools to read and write tabular data in CSV format 
* [`json`](https://docs.python.org/2/library/json.html) library allows to users to work with JSON files

In [27]:
import csv

# Writing CSV data
with open('test.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow( ('NO', 'Letter', 'Date') )
    for i in range(10):
        writer.writerow( (i+1, chr(ord('a') + i), '08/%02d/07' % (i+1)) )
        
# Reading data back
with open('test.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print (row)

['NO', 'Letter', 'Date']
['1', 'a', '08/01/07']
['2', 'b', '08/02/07']
['3', 'c', '08/03/07']
['4', 'd', '08/04/07']
['5', 'e', '08/05/07']
['6', 'f', '08/06/07']
['7', 'g', '08/07/07']
['8', 'h', '08/08/07']
['9', 'i', '08/09/07']
['10', 'j', '08/10/07']


`chr` and `ord` are [basic Python functions](https://docs.python.org/2/library/functions.html) which work with ASCII character's codes 

In [28]:
import json
import random

# converts str type to a list
l = list('abcdefgh')

# Writing JSON data
with open('test.json', 'w') as f:
    random.shuffle(l)
    letters = ''.join(l)
    data = {
            "NO": random.randint(0, 100),
            "letters": letters,
            "choice": random.choice('abcdefgh'),
            "rounded": round(random.random(), 3)
        }
    json.dump(data, f)

# Reading data back
with open('test.json', 'r') as f:
    for line in f:
        print (json.loads(line))

{'NO': 19, 'letters': 'ebhgfcad', 'choice': 'd', 'rounded': 0.651}


Above we have used module [`random`](https://docs.python.org/2/library/random.html), which implements pseudo-random number generators for various distributions, and also function [`round(n, m)`](https://docs.python.org/2/library/functions.html#round) that returns the floating point value `n` rounded to `m` digits after the decimal point. 

>### Exercise 5.1

>The command below will create a file called `phonetest.txt`

>``` 
>    %%file phonetest.txt
>    679-397-5255
>    2126660921
>    212-998-0902
>    888-888-2222
>    800-555-1211
>    800 555 1212
>    800.555.1213
>    (800) 555-1214
>    1-800-555-1215
>    1(800)555-1216
>    800-555-1212-1234
>    800-555-1212x1234
>    800-555-1212 ext. 1234
>    work 1-(800) 555.121 ```
    
>* Overwrite `phonetest.txt` in the opposite order (i.e. the first line should be written the last, etc.) using Python funcionality. Read content of renewed file to the `content` string variable, where after each row you should add its number

In [None]:
# type your code here

In [None]:
from test_helper import Test

Test.assertEqualsHashed(content, '8c4be466aa5f8bbeb7d13c7cde2f06cf7d696ae8', 'Incorrect value of "content"', 
                        "Exercise 5.1 is successful")

<center><h3>Presented by <a target="_blank" href="http://datascience-school.com">datascience-school.com</a></h3></center>