### Working with strings

When the list of files is short like this one, it's not difficult to find the ones we want, but if the list were longer, we might need some help. If we're only interested in finding files that deal with Mexico, we could search the files for files beginning with mexico-city-real-estate-. To do this, we'll use the .glob function.

In [1]:
import glob
glob.glob("./data/mexico-city-real-estate-[0-9].csv")

['./data/mexico-city-real-estate-5.csv',
 './data/mexico-city-real-estate-4.csv',
 './data/mexico-city-real-estate-1.csv',
 './data/mexico-city-real-estate-3.csv',
 './data/mexico-city-real-estate-2.csv']

The .glob function allows for pattern matching. In this example [0-9] allows for any digit between 0 and 9, but there are lots of other patterns that .glob can find. Here are a few of the more common ones:

- \* Match any number of characters
- ? Match a single character of any kind
- [a-z] Match any lower case alphabetical character in the current locale
- [A-Z] Match any upper case alphabetical character in the current locale
- [!a-z] Do not match any lower case alphabetical character in the current locale

In [2]:
glob.glob("./data/mexico-city*")

['./data/mexico-city-real-estate-5.csv',
 './data/mexico-city-real-estate-4.csv',
 './data/mexico-city-real-estate-1.csv',
 './data/mexico-city-real-estate-3.csv',
 './data/mexico-city-real-estate-2.csv']

### Manipulating Strings

In [3]:
file_name = "mexico-city-real-estate-1"
file_name.split("-")

['mexico', 'city', 'real', 'estate', '1']

In [4]:
file_name = "mexico-city-real-estate-1"

modified_file_name = file_name.replace("-", "_")

modified_file_name

'mexico_city_real_estate_1'

In [5]:
import datetime

python_birthday = datetime.datetime(year=1991, month=2, day=20)
print(
    f"Python first appeared on {python_birthday:%B %d} in the year {python_birthday:%Y}."
)

now = datetime.datetime.now()
print(f"Python is {now.year - python_birthday.year} years old.")

Python first appeared on February 20 in the year 1991.
Python is 33 years old.


### List Comprehension

List comprehension is used to iterate through lists without explicitly writing loops, which is especially useful for filtering data according to a specific condition.

In [6]:
price_mexican_pesos = [
    35000000.0,
    2000000.0,
    2700000.0,
    6347000.0,
    6994543.16,
    6617835.61,
    670000.0,
]

In [7]:
price_colombian_pesos = []
for price in price_mexican_pesos:
    price_colombian_pesos.append(price * 190)

print(price_colombian_pesos)

[6650000000.0, 380000000.0, 513000000.0, 1205930000.0, 1328963200.4, 1257388765.9, 127300000.0]


In [8]:
price_colombian_pesos = [price * 190 for price in price_mexican_pesos]
print(price_colombian_pesos)

[6650000000.0, 380000000.0, 513000000.0, 1205930000.0, 1328963200.4, 1257388765.9, 127300000.0]


### Lambda Functions

The function definitions we've been working with so far are fine for most purposes, but they can easily become a little long. When that happens, you might want to use a shorter method to expressing a function; that's what lambda functions are for. Here's code for a function which adds 3 to a number.

In [9]:
add_three = lambda a: a + 5
add_three(5)

10

### Working with errors

Error handling is a very important part of coding. It will make sure our code runs smoothly even with edge cases. try and except are the syntax we use in error handling. Let's create a function to demonstrate how this works.

We start with a function that calculates the quotient of two numbers. There are two inputs of the function: nominator and denominator. The function works only when:

- both inputs are numerical numbers
- the denominator is not zero
 
We can use try and except to make sure the function runs smoothly even with error inputs.

In [10]:
def get_quotient(nominator, denominator):
    try:
        quotient = nominator / denominator
        return quotient
    except:  # noQA E722
        return print("function not working")

In [11]:
get_quotient(1,2)

0.5

In [2]:
def get_quotient(nominator, denominator):
    try:
        quotient = nominator / denominator
        return quotient
    except Exception as e:
        return print(e)

In [3]:
get_quotient(1,0)

division by zero


### Create files using Context Manager

A context manager allows you to allocate and release resources precisely when you want to. The most widely used example of context managers is the with statement. Suppose you have two related operations which you would like to execute as a pair, with a block of code in between. Context managers allow you to do specifically that. For example:

In [12]:
with open("data/example.txt", "w") as f:
    f.write("Hello")
    f.write("\n")
    f.write("Hola")

### Saving and Loading Files with joblib

We can also use joblib's dump and load functions to save and load data. Besides saving and loading data, we can also save and load trained models for later use. Let's say an example here. First, we import joblib and train a model for the iris dataset:

In [4]:
from joblib import dump, load
from sklearn import datasets, svm

iris = datasets.load_iris()
X, y = iris.data, iris.target

clf = svm.SVC()
clf.fit(X, y)

In [19]:
# Saving model to a path
dump(clf, "data/trained_model.pkl")

['data/trained_model.pkl']

In [20]:
# Load data from a path and make predictions again

model = load("data/trained_model.pkl", mmap_mode="r")

model.predict(X)

memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

### Working with Filepaths

A filepath is a directory to a specific file. Python uses the os module to work with path names and access files in the local directory. Here are some common use cases of the os module:

`os.getcwd()` points to the current working directory:

### Testing Code

Python has a very useful function isinstance that can be used to check the type of an object. The following example checks whether the object is an integer.

In [24]:
inputs = [1, 2]

isinstance(any(inputs), str)

False

In [25]:
isinstance(all(inputs), (int, float))

True

You can add an assert function to your code to make sure you are using the right input data before you run the code:

In [7]:
assert isinstance(1, int)

If the statement after assert is True, the code will continue and nothing will be produced. If assert is False, then it will throw an error. You will use the error to debug your code.

Another function that is useful is hasattr, which checks what are the attributes for an object. The object can be a defined class, or a customized class. In the following example, I am checking whether the str class has the isupper and isstring attributes using hasattr. The function will return a boolean variable, either True or False:

In [28]:
class Cat:
    age = 3
    name = "Lily"

cat = Cat()

hasattr(cat, "age")

True