# Using libraries in Python to access more functions

All of the functions in previous notebooks are built into Python. But you can access thousands more functions from **libraries** that people have created (sometimes called **modules**).

A **library** is a collection of functions and other code that someone has created - typically to solve a particular problem. For example:

* The `pandas` library was created to solve problems relating to data analysis
* The `matplotlib` library was created to add extra visualisation functionality to Python
* The `scraperwiki` library was created to solve scraping problems. It's not the only one: Beautiful Soup is another library for this, too.
* The `pdf2xml` library was designed to help deal with PDFs in Python (by converting them to xml)
* The `re` library provides functions to use **regular expressions** in Python, in order to describe patterns you might want to match (and fetch) in text or webpages.
* The `os` library was [designed](https://docs.python.org/3/library/os.html) to add "operating system dependent functionality" such as reading or writing files.
* The `Scikit-learn` library allows you to use machine learning in Python
* The `NumPy` library has lots of functions to do more advanced mathematical processing

A good principle to bear in mind when using programming languages is that if you come across a problem, chances are that someone else has already come across a similar problem - and create a library to solve it. A bit of googling around can help you find that, and learn how to use it to help solve your problem.

To use a library's functions you need to **import** it.

This is done by using the conveniently-named `import` function, followed by the name of the library.

In [None]:
#import the pandas library to use its functions
import pandas

## Getting the library name right - and `ModuleNotFoundError`

Some libraries have names that you need to be careful to get right. If you try to import the machine learning library scikit-learn, for example, this code will give you an error:

In [None]:
#import the machine learning library scikit-learn
import skikit-learn

SyntaxError: ignored

In this case it's giving an `invalid syntax` error because you can't have a dash in a library name. 

We can try removing that, so at least the name of the library isn't breaking those rules, but we'll still get a (different) error:

In [None]:
#import the machine learning library scikit-learn
import scikitlearn

ModuleNotFoundError: ignored

This `ModuleNotFoundError` means you've either named the library (module) incorrectly, or it's not been installed (more on this below). Either way, it doesn't know where to find the thing you're asking it to import.

If this happens, check google "import" and the name of the library to find out what code is used to import it. 

You may have to scroll past code relating to *installing* the library (more on this in a moment) - the key is to look for the line of code where `import` is first used. 

In this case [the code to import sciki-learn](https://www.journaldev.com/18341/python-scikit-learn-tutorial) refers to it as `sklearn`:

In [None]:
#import the machine learning library scikit-learn
import sklearn

## Installing libraries

Sometimes the `ModuleNotFoundError` is caused by the fact that the library is not **installed**.

The most commonly used libaries, like `pandas` and `matplotlib`, are pre-installed in whatever you're using to do your coding (in this case, Colab notebooks), so you only need to use the `import` command to start using them.

But some less common libraries will need installing first.

Here's what happens, for example, when you try to install the scraping library `scraperwiki`:

In [None]:
#import the scraping library scraperwiki
import scraperwiki

ModuleNotFoundError: ignored

This is the same error again - but a quick google will tell us that we've not spelt it incorrectly. 

Instead we need to look at the bottom part of the error message: 

> `If your import is failing due to a missing package, you can manually install dependencies using either !pip or !apt.`

This is exactly what we need to do. 

To install a library (the "package" referred to in the message), you use `!pip install` followed by the name of the library, like so:

In [None]:
#install the scraping library scraperwiki
!pip install scraperwiki

Collecting scraperwiki
  Downloading scraperwiki-0.5.1.tar.gz (7.7 kB)
Collecting alembic
  Downloading alembic-1.7.5-py3-none-any.whl (209 kB)
[K     |████████████████████████████████| 209 kB 7.6 MB/s 
[?25hCollecting Mako
  Downloading Mako-1.1.6-py2.py3-none-any.whl (75 kB)
[K     |████████████████████████████████| 75 kB 3.7 MB/s 
Building wheels for collected packages: scraperwiki
  Building wheel for scraperwiki (setup.py) ... [?25l[?25hdone
  Created wheel for scraperwiki: filename=scraperwiki-0.5.1-py3-none-any.whl size=6545 sha256=c93f9f534f1506a2368a3d21699a62ab4b7181a8a9cce99d108db57ac86079bf
  Stored in directory: /root/.cache/pip/wheels/3c/57/8d/41e15f7e5cc9eb0067539416abd445f210c0d04f39975d5ca5
Successfully built scraperwiki
Installing collected packages: Mako, alembic, scraperwiki
Successfully installed Mako-1.1.6 alembic-1.7.5 scraperwiki-0.5.1


As you can see above, this will set off a bunch of lines of output as the library is fetched ("Collecting scraperwiki") and any "dependencies" downloaded (other code that is used by this one) and then installed.

Once that's done you can then run the `import` command without an error - because the library is now installed.

In [None]:
#import the scraping library scraperwiki
import scraperwiki

## Importing `from` a library

Sometimes you will come across tutorials where the library appears to be imported 'from' another library. 

For example, if you look at any code using the scraping library Beautiful Soup you might see a line like this:

In [None]:
#import the scraping library beautiful soup
from bs4 import BeautifulSoup

In this case, we are actually only importing *part* of the Beautiful Soup library. And even more confusingly, that part is called `BeautifulSoup`.

*(Before I continue, it's worth pointing out that you don't need to understand what's going on in this code for it to work for you. It's quite common to copy and apply code which works without necessarily being able to explain why)*

Here's what's happening:

* The Beautiful Soup library is actually called `bs4`
* Within that, there's a module called `BeautifulSoup`

Instead of importing the entire Beautiful Soup library (`bs4`), this code just imports part of it (`BeautifulSoup`).

Why? Well, [largely for convenience](http://www.wellho.net/mouth/418_Difference-between-import-and-from-in-Python.html): it means we don't have to write code that says "use `BeautifulSoup` from `bs4`". Instead we can just write code that says "use `BeautifulSoup`" and it will know what we mean.

So, if you see this code being suggested it probably means it's just easier to use that library in that way.

You can [read more about this, including advantages and disadvantages, in this post](https://stackabuse.com/relative-vs-absolute-imports-in-python/)

## Renaming a library while importing: `as`

Another thing you will come across often in code and tutorials is libraries that are imported 'as' another name. 

A common example is `pandas`. Here's how the `pandas` library is normally imported:

In [None]:
#import pandas, rename it as 'pd'
import pandas as pd

What's happening here? Well, again, it's all about convenience. 

Every time you use a function from a library you need to specify the name of the library we are using. For example to use the `read_csv()` function from `pandas` you would write `pandas.read_csv()`

Renaming the library as 'pd' just means we only have to type two characters every time we use it - rather than 6 ('pandas'). So: `pd.read_csv()`.

You don't have to rename the function `as pd` - it's a choice. But if the code you're following has done it then it makes sense to stick with that as you'll be less likely to encounter errors where your code differs from the example you're following.



## Using functions from a library

As mentioned above, to use any function from a library you'll typically need to name the library first, followed by the function from that library, with the name of the library and function joined by a period, like so:

In [None]:
pd.read_csv("sample_data/california_housing_test.csv")

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-122.05,37.37,27.0,3885.0,661.0,1537.0,606.0,6.6085,344700.0
1,-118.30,34.26,43.0,1510.0,310.0,809.0,277.0,3.5990,176500.0
2,-117.81,33.78,27.0,3589.0,507.0,1484.0,495.0,5.7934,270500.0
3,-118.36,33.82,28.0,67.0,15.0,49.0,11.0,6.1359,330000.0
4,-119.67,36.33,19.0,1241.0,244.0,850.0,237.0,2.9375,81700.0
...,...,...,...,...,...,...,...,...,...
2995,-119.86,34.42,23.0,1450.0,642.0,1258.0,607.0,1.1790,225000.0
2996,-118.14,34.06,27.0,5257.0,1082.0,3496.0,1036.0,3.3906,237200.0
2997,-119.70,36.30,10.0,956.0,201.0,693.0,220.0,2.2895,62000.0
2998,-117.12,34.10,40.0,96.0,14.0,46.0,14.0,3.2708,162500.0


Don't worry about this particular function and what it's doing - I just want to explain the structure of the line of code.

In this case, then, we are using the `read_csv()` function from the `pandas` library. 

Because the `pandas` library was imported `as pd` it's that name (`pd`) that we use when calling its function. So:

`pd.read_csv()`

With the ingredients we want to use for that function inside the parentheses.

## When a function is used but a library is not named: methods

Sometimes you might see a function being used in a different way: instead of being attached to the library name with a period, it will be attached to a variable with a period. 

Here's an example, where the variable in question is a dataframe called `df`, and the function is `head()` which shows the first few rows of a dataframe:

`df.head()`

Again, you don't have to understand this code in order to use it or get it to work - but if you're curious why the pandas function `head()` isn't prefixed by `pd.`, here's an explanation.

One of the features of many libraries is their ability to create certain types of 'objects'. The `pandas` library, for example, creates a special type of object called a **dataframe**. 

Objects have properties: a dataframe, for example, has rows and columns, and column names.

But those properties can also include **functions**.

In fact, strictly speaking, they're not functions: they're **methods**. But they work in exactly the same way, and it's easy to confuse the two. 

When an object's properties include a method, you can use that function to get information about it. 

In the example code `df.head()` the `df` part is a dataframe, which includes the method `head()`. By attaching `head()` to `df` using a period, we are asking it to 'call the head function from df' in the same way as we might say 'call the read_csv function from pandas'. 

This is a bit mind-bending so it may take a bit of playing around to get your head around - and you may find it easier to ignore this piece of knowledge entirely, because it won't stop you using code effectively (I coded for years without really understanding this subtle difference). 
