# 🎄 Python Advent Calendar 🎄
## [Day 3: Transform Your Data Transformations](https://py-advent-calendar.beehiiv.com/p/day-3-transform-data-transformations)
_🔗 [Read the newsletter here](https://py-advent-calendar.beehiiv.com/p/day-3-transform-data-transformations)_

*Behind today's door is a remarkably library packed with so much functionality that I'm seriously considering writing a RAG-based LLM to help identify all the places my code could be simplified by switching out my hand-carved functions for their convenience methods. Introducing... **PyJanitor**!*

<img src="./ac-3-door.png" width=400>

---

In [1]:
%load_ext jupyter_black

[PyJanitor](https://flight.beehiiv.net/v2/clicks/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1cmwiOiJodHRwczovL3B5amFuaXRvci1kZXZzLmdpdGh1Yi5pby9weWphbml0b3IvP3V0bV9zb3VyY2U9cHktYWR2ZW50LWNhbGVuZGFyLmJlZWhpaXYuY29tJnV0bV9tZWRpdW09cmVmZXJyYWwmdXRtX2NhbXBhaWduPWRheS0zLXRyYW5zZm9ybS15b3VyLWRhdGEtdHJhbnNmb3JtYXRpb25zIiwicG9zdF9pZCI6Ijc2NWVkMWViLWM4YmUtNDEwZS1hNjUxLTUzNWU3ZGJiMjAzZiIsInB1YmxpY2F0aW9uX2lkIjoiMjQxMWMwYzYtOWVlNi00NDU0LWEwZGEtMWI2YzViNDhlNDRiIiwidmlzaXRfdG9rZW4iOiI4YWViMzEyMi1mY2I5LTRmNjYtODNmNy01ZWYzYzk0YTFlNWUiLCJpYXQiOjE3MDE2NDk0MDAsImlzcyI6Im9yY2hpZCJ9.yjCUF04fatkWZbiWDlO4P4csPPenqyp-QJjUnLXyEus): data cleaning pandas extensions
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

📆 Last updated: November 2023\
⬇️  Downloads: 9,701/week\
⚖️  License: MIT\
🐍 [PyPI](https://flight.beehiiv.net/v2/clicks/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1cmwiOiJodHRwczovL3B5cGkub3JnL3Byb2plY3QvcHlqYW5pdG9yLz91dG1fc291cmNlPXB5LWFkdmVudC1jYWxlbmRhci5iZWVoaWl2LmNvbSZ1dG1fbWVkaXVtPXJlZmVycmFsJnV0bV9jYW1wYWlnbj1kYXktMy10cmFuc2Zvcm0teW91ci1kYXRhLXRyYW5zZm9ybWF0aW9ucyIsInBvc3RfaWQiOiI3NjVlZDFlYi1jOGJlLTQxMGUtYTY1MS01MzVlN2RiYjIwM2YiLCJwdWJsaWNhdGlvbl9pZCI6IjI0MTFjMGM2LTllZTYtNDQ1NC1hMGRhLTFiNmM1YjQ4ZTQ0YiIsInZpc2l0X3Rva2VuIjoiOGFlYjMxMjItZmNiOS00ZjY2LTgzZjctNWVmM2M5NGExZTVlIiwiaWF0IjoxNzAxNjQ5NDAwLCJpc3MiOiJvcmNoaWQifQ.ofprpaPasIB3qEl29NVkPF8vgJHKuJm2cowo6_Inn7Q) |  ⭐ [GitHub Stars: 1.2k](https://flight.beehiiv.net/v2/clicks/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1cmwiOiJodHRwczovL2dpdGh1Yi5jb20vcHlqYW5pdG9yLWRldnMvcHlqYW5pdG9yP3V0bV9zb3VyY2U9cHktYWR2ZW50LWNhbGVuZGFyLmJlZWhpaXYuY29tJnV0bV9tZWRpdW09cmVmZXJyYWwmdXRtX2NhbXBhaWduPWRheS0zLXRyYW5zZm9ybS15b3VyLWRhdGEtdHJhbnNmb3JtYXRpb25zIiwicG9zdF9pZCI6Ijc2NWVkMWViLWM4YmUtNDEwZS1hNjUxLTUzNWU3ZGJiMjAzZiIsInB1YmxpY2F0aW9uX2lkIjoiMjQxMWMwYzYtOWVlNi00NDU0LWEwZGEtMWI2YzViNDhlNDRiIiwidmlzaXRfdG9rZW4iOiI4YWViMzEyMi1mY2I5LTRmNjYtODNmNy01ZWYzYzk0YTFlNWUiLCJpYXQiOjE3MDE2NDk0MDAsImlzcyI6Im9yY2hpZCJ9.Mv6RY4Z5KIYDW9hI2GnHv7ZrUcYqo0LuM10WdyXF2Z8)

### 🔍 What is it?

PyJanitor extends pandas with many custom methods for common data cleaning & preprocessing tasks. Originally a port of R's [janitor package](https://flight.beehiiv.net/v2/clicks/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1cmwiOiJodHRwczovL2NyYW4uci1wcm9qZWN0Lm9yZy93ZWIvcGFja2FnZXMvamFuaXRvci92aWduZXR0ZXMvamFuaXRvci5odG1sP3V0bV9zb3VyY2U9cHktYWR2ZW50LWNhbGVuZGFyLmJlZWhpaXYuY29tJnV0bV9tZWRpdW09cmVmZXJyYWwmdXRtX2NhbXBhaWduPWRheS0zLXRyYW5zZm9ybS15b3VyLWRhdGEtdHJhbnNmb3JtYXRpb25zIiwicG9zdF9pZCI6Ijc2NWVkMWViLWM4YmUtNDEwZS1hNjUxLTUzNWU3ZGJiMjAzZiIsInB1YmxpY2F0aW9uX2lkIjoiMjQxMWMwYzYtOWVlNi00NDU0LWEwZGEtMWI2YzViNDhlNDRiIiwidmlzaXRfdG9rZW4iOiI4YWViMzEyMi1mY2I5LTRmNjYtODNmNy01ZWYzYzk0YTFlNWUiLCJpYXQiOjE3MDE2NDk0MDAsImlzcyI6Im9yY2hpZCJ9.ZowSWm_F33Hx8CNUft2Wb7kCkM87qWvZjuCpjyEdIac), PyJanitor aims to create a consistent API for data cleaning with an emphasis on enabling the "method chaining" paradigm.

Beyond the [built-in functions](https://flight.beehiiv.net/v2/clicks/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1cmwiOiJodHRwczovL3B5amFuaXRvci1kZXZzLmdpdGh1Yi5pby9weWphbml0b3IvYXBpL2Z1bmN0aW9ucy8_dXRtX3NvdXJjZT1weS1hZHZlbnQtY2FsZW5kYXIuYmVlaGlpdi5jb20mdXRtX21lZGl1bT1yZWZlcnJhbCZ1dG1fY2FtcGFpZ249ZGF5LTMtdHJhbnNmb3JtLXlvdXItZGF0YS10cmFuc2Zvcm1hdGlvbnMiLCJwb3N0X2lkIjoiNzY1ZWQxZWItYzhiZS00MTBlLWE2NTEtNTM1ZTdkYmIyMDNmIiwicHVibGljYXRpb25faWQiOiIyNDExYzBjNi05ZWU2LTQ0NTQtYTBkYS0xYjZjNWI0OGU0NGIiLCJ2aXNpdF90b2tlbiI6IjhhZWIzMTIyLWZjYjktNGY2Ni04M2Y3LTVlZjNjOTRhMWU1ZSIsImlhdCI6MTcwMTY0OTQwMCwiaXNzIjoib3JjaGlkIn0.QRiUgBoRmP_kv8VlZ-TAZd4IHrtoq8BNMJMngxsJoUg) (of which only a small selection is highlighted below), PyJanitor also includes convenience data processing functions specific to biology, chemistry, engineering, finance, machine learning, mathematics, and time-series processing.

---

### 📦 Install

```
pip install pyjanitor
```

---

### 🛠️ Use

#### 1. Load the data

Today's examples build off [Mark Pack's PollBase](https://flight.beehiiv.net/v2/clicks/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1cmwiOiJodHRwczovL3d3dy5tYXJrcGFjay5vcmcudWsvb3Bpbmlvbi1wb2xscy8_dXRtX3NvdXJjZT1weS1hZHZlbnQtY2FsZW5kYXIuYmVlaGlpdi5jb20mdXRtX21lZGl1bT1yZWZlcnJhbCZ1dG1fY2FtcGFpZ249ZGF5LTMtdHJhbnNmb3JtLXlvdXItZGF0YS10cmFuc2Zvcm1hdGlvbnMiLCJwb3N0X2lkIjoiNzY1ZWQxZWItYzhiZS00MTBlLWE2NTEtNTM1ZTdkYmIyMDNmIiwicHVibGljYXRpb25faWQiOiIyNDExYzBjNi05ZWU2LTQ0NTQtYTBkYS0xYjZjNWI0OGU0NGIiLCJ2aXNpdF90b2tlbiI6IjhhZWIzMTIyLWZjYjktNGY2Ni04M2Y3LTVlZjNjOTRhMWU1ZSIsImlhdCI6MTcwMTY0OTQwMCwiaXNzIjoib3JjaGlkIn0.jaJw9C4TqeOs45lcD3ST99aadZgmSTP4Ge0LWiSNDK0), a database of UK voting intention opinion polls between 1943 and 2023. Here's how it looks once read into pandas:

In [4]:
import pandas as pd

# !wget https://www.markpack.org.uk/files/2023/10/PollBase-Q3-2023.xlsx

df = pd.read_excel("PollBase-Q3-2023.xlsx", sheet_name="Monthly average")
df.head()

  warn("""Cannot parse header or footer so it will be ignored""")


Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Conservative,Unnamed: 3,Labour,Unnamed: 5,LD,Unnamed: 7,UKIP,Unnamed: 9,SDP,TIG,Unnamed: 12,BXP,Unnamed: 14,Green,Unnamed: 16,Unnamed: 17,Lead,Unnamed: 19
0,,GE,43.5,,28.3,,26.0,,,,,,,,,,,,15.2,
1,1983.0,1983-06-01 00:00:00,43.0,,27.0,,29.0,,,,,,,,,,,,16.0,
2,,1983-07-01 00:00:00,44.8,1.8,28.633333,1.633333,25.1,-3.9,,,,,,,,,,,16.166667,0.166667
3,,1983-08-01 00:00:00,45.65,0.85,27.375,-1.258333,25.525,0.425,,,,,,,,,,,18.275,2.108333
4,,1983-09-01 00:00:00,45.533333,-0.116667,26.666667,-0.708333,26.266667,0.741667,,,,,,,,,,,18.866667,0.591667


#### 2\. Remove empty rows/columns

It needs a lot of cleaning! Let's import janitor and use PyJanitor's `.remove_empty()`method to remove any rows/columns which are entirely empty:

In [5]:
import janitor

print(df.shape)
df = df.remove_empty()
print(df.shape)

(515, 20)
(515, 19)


#### 3\. Eliminate extraneous columns

Next, let's remove all columns with a name like `Unnamed: 0`, these are mostly used to indicate the month-on-month change in polling for each party. This could easily be derived from the raw numbers using pandas' `.diff()` method if required. We rename the second column to `Date` to avoid this being dropped also. 

Note: This operation, like most of PyJanitor's functions, can easily be done using pandas' native `.drop()` method, however the `.select_columns()` method offers more powerful filtering options such as

In [6]:
df = (
    df.rename(columns={"Unnamed: 1": "Date"})
    .select_columns("Unnamed*", invert=True)
    .select_columns("Lead", invert=True)
)

In [7]:
df.head()

Unnamed: 0,Date,Conservative,Labour,LD,UKIP,SDP,TIG,BXP,Green
0,GE,43.5,28.3,26.0,,,,,
1,1983-06-01 00:00:00,43.0,27.0,29.0,,,,,
2,1983-07-01 00:00:00,44.8,28.633333,25.1,,,,,
3,1983-08-01 00:00:00,45.65,27.375,25.525,,,,,
4,1983-09-01 00:00:00,45.533333,26.666667,26.266667,,,,,


#### 4\. Auto-clean column names

This yields a much simpler DataFrame. Next, let's use the rather handy `.clean_names()` method to simplify & lower-case the column names:

In [9]:
df = df.clean_names()
df.head()

Unnamed: 0,date,conservative,labour,ld,ukip,sdp,tig,bxp,green
0,GE,43.5,28.3,26.0,,,,,
1,1983-06-01 00:00:00,43.0,27.0,29.0,,,,,
2,1983-07-01 00:00:00,44.8,28.633333,25.1,,,,,
3,1983-08-01 00:00:00,45.65,27.375,25.525,,,,,
4,1983-09-01 00:00:00,45.533333,26.666667,26.266667,,,,,


#### 5\. Filter out rows not pertaining to opinion polls

Now let's remove the `Date` rows containing the string `"GE"` (these indicate UK General Elections, rather than opinion polls). We could do this using `df.query("date != 'GE'")` but I'll use the opportunity to demo the convenient and very powerful method, `.filter_column_isin()` . This is a method-chain-friendly equivalent to `df = df[df["colour"].isin(["red", "green", "blue"])]` with optional negation via the `complement=True` argument.

In [10]:
df = df.filter_column_isin(
    column_name="date",
    iterable=["GE"],
    complement=True,
)

In [11]:
df.head()

Unnamed: 0,date,conservative,labour,ld,ukip,sdp,tig,bxp,green
1,1983-06-01 00:00:00,43.0,27.0,29.0,,,,,
2,1983-07-01 00:00:00,44.8,28.633333,25.1,,,,,
3,1983-08-01 00:00:00,45.65,27.375,25.525,,,,,
4,1983-09-01 00:00:00,45.533333,26.666667,26.266667,,,,,
5,1983-10-01 00:00:00,42.38,35.4,20.76,,,,,


#### 6\. Pivot to long

Pandas has powerful functionality for reshaping data although personally I find the method names unintuitive and tricky to remember. It's my #1 reason for referencing the (highly recommended) [official pandas cheatsheet](https://flight.beehiiv.net/v2/clicks/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1cmwiOiJodHRwczovL3BhbmRhcy5weWRhdGEub3JnL1BhbmRhc19DaGVhdF9TaGVldC5wZGY_dXRtX3NvdXJjZT1weS1hZHZlbnQtY2FsZW5kYXIuYmVlaGlpdi5jb20mdXRtX21lZGl1bT1yZWZlcnJhbCZ1dG1fY2FtcGFpZ249ZGF5LTMtdHJhbnNmb3JtLXlvdXItZGF0YS10cmFuc2Zvcm1hdGlvbnMiLCJwb3N0X2lkIjoiNzY1ZWQxZWItYzhiZS00MTBlLWE2NTEtNTM1ZTdkYmIyMDNmIiwicHVibGljYXRpb25faWQiOiIyNDExYzBjNi05ZWU2LTQ0NTQtYTBkYS0xYjZjNWI0OGU0NGIiLCJ2aXNpdF90b2tlbiI6IjhhZWIzMTIyLWZjYjktNGY2Ni04M2Y3LTVlZjNjOTRhMWU1ZSIsImlhdCI6MTcwMTY0OTQwMCwiaXNzIjoib3JjaGlkIn0.eecv3FY1pVk-iggwKC0UD8rw4VbXV2a9enYpdPvlMig):

<img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/5b3b04a3-502c-4856-b672-585a865a17af/image.png?t=1701647070" width=800 />

Visually, it appears we can use the `pd.melt()` function to turn our wide DataFrame into a long dataset, but this doesn't play nicely with working in a method chain paradigm. PyJanitor introduces two intuitively named methods: `.pivot_longer()` and `.pivot_wider()`

In [12]:
df = df.pivot_longer(
    index="date",
    names_to="party",
    values_to="voteshare",
)

In [13]:
df.head()

Unnamed: 0,date,party,voteshare
0,1983-06-01,conservative,43.0
1,1983-07-01,conservative,44.8
2,1983-08-01,conservative,45.65
3,1983-09-01,conservative,45.533333
4,1983-10-01,conservative,42.38


#### 7\. Encode columns as Categorical

Our DataFrame is beginning to look like model-ready! With the new `party` column containing a set of strings, it makes sense to convert this to pandas' [Categorical data type](https://flight.beehiiv.net/v2/clicks/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1cmwiOiJodHRwczovL3BhbmRhcy5weWRhdGEub3JnL2RvY3MvdXNlcl9ndWlkZS9jYXRlZ29yaWNhbC5odG1sP3V0bV9zb3VyY2U9cHktYWR2ZW50LWNhbGVuZGFyLmJlZWhpaXYuY29tJnV0bV9tZWRpdW09cmVmZXJyYWwmdXRtX2NhbXBhaWduPWRheS0zLXRyYW5zZm9ybS15b3VyLWRhdGEtdHJhbnNmb3JtYXRpb25zIiwicG9zdF9pZCI6Ijc2NWVkMWViLWM4YmUtNDEwZS1hNjUxLTUzNWU3ZGJiMjAzZiIsInB1YmxpY2F0aW9uX2lkIjoiMjQxMWMwYzYtOWVlNi00NDU0LWEwZGEtMWI2YzViNDhlNDRiIiwidmlzaXRfdG9rZW4iOiI4YWViMzEyMi1mY2I5LTRmNjYtODNmNy01ZWYzYzk0YTFlNWUiLCJpYXQiOjE3MDE2NDk0MDAsImlzcyI6Im9yY2hpZCJ9.jNWhmlCZb4M0HL9V9l3aGd6jJmjX4mECEJaTiXSi4Dk). This can be preferably for performance reasons, as well as to signal the data type to other libraries which can utilise the data type metadata to automatically adapt data visualisation or statistical tests.

In [14]:
df = df.encode_categorical(["party"])
print(df.party)

0       conservative
1       conservative
2       conservative
3       conservative
4       conservative
            ...     
4035           green
4036           green
4037           green
4038           green
4039           green
Name: party, Length: 4040, dtype: category
Categories (8, object): ['bxp', 'conservative', 'green', 'labour', 'ld', 'sdp', 'tig', 'ukip']


### 8. One-hot encoding with category expansion

Another modelling convenience function, similar to pandas pd.get_dummies() or the one-hot encoder in scikit-learn:

In [15]:
df.expand_column("party")

Unnamed: 0,date,party,voteshare,bxp,conservative,green,labour,ld,sdp,tig,ukip
0,1983-06-01,conservative,43.000000,0,1,0,0,0,0,0,0
1,1983-07-01,conservative,44.800000,0,1,0,0,0,0,0,0
2,1983-08-01,conservative,45.650000,0,1,0,0,0,0,0,0
3,1983-09-01,conservative,45.533333,0,1,0,0,0,0,0,0
4,1983-10-01,conservative,42.380000,0,1,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
4035,2024-08-01,green,,0,0,1,0,0,0,0,0
4036,2024-09-01,green,,0,0,1,0,0,0,0,0
4037,2024-10-01,green,,0,0,1,0,0,0,0,0
4038,2024-11-01,green,,0,0,1,0,0,0,0,0


#### 9\. Splitting data into features & target for machine learning

It's common to create a features DataFrame `X = df.drop(columns=target)` and then `y = df.target` when working with pandas and modelling libraries such as scikit-learn or statsmodels. PyJanitor brings a convenience function for this.

In [16]:
X, y = df.get_features_targets(target_column_names=["voteshare"])

In [17]:
X.head(2)

Unnamed: 0,date,party
0,1983-06-01,conservative
1,1983-07-01,conservative


In [18]:
y.head(2)

Unnamed: 0,voteshare
0,43.0
1,44.8


#### 10\. The best of the rest

The above gives an idea of the variety of convenience methods built into PyJanitor. It also includes functionality for: 

-   **Coalescing columns** (like the SQL `COALESCE()` function) 

-   **Concatenation** (e.g. for creating a unique index) and **deconcatenation** (great for working with log data) 

-   **Date conversion helpers** to convert from Unix epochs, Excel integer timestamps, and Matlab-style timestamps. 

-   **Handy tools** like `.complete()` (the "opposite of dropna"), `.conditional_join()`, `.count_cumulative_unique()`, `.currency_column_to_numeric()`, `.groupby_topk()`, `.round_to_fraction()`, `.sort_naturally()` and `.update_where()`

-   **Industry-specific functionality** such as working with FASTA bioinformatics files, generating RDKIT molecular descriptors, converting between various engineering units, currency convertors, getting a company's name from its symbol, an inflation calculator, reading in multiple CSVs into a single dataframe, mathematical functions such as sigmoid() and softmax(), and flagging jumps exceeding a threshold in a time series dataset.

In [22]:
# convert_excel_date()
df = pd.DataFrame({"date": [39690, 39690, 37118]})
print(df)
df.convert_excel_date("date")

    date
0  39690
1  39690
2  37118


Unnamed: 0,date
0,2008-08-30
1,2008-08-30
2,2001-08-15


In [23]:
# filter_column_isin()
df = pd.DataFrame({"names": ["Jane", "Jeremy", "John"], "foo": list("xyz")})
print(df)
df.filter_column_isin(column_name="names", iterable=["James", "John"])

    names foo
0    Jane   x
1  Jeremy   y
2    John   z


Unnamed: 0,names,foo
2,John,z


In [24]:
# The above is equivalent to
df[df["names"].isin(["James", "John"])]

Unnamed: 0,names,foo
2,John,z


In [25]:
# filter_string()
df = pd.DataFrame({"a": range(3, 6), "b": ["bear", "peeL", "sail"]})
print(df)
df.filter_string(column_name="b", search_string="ee")

   a     b
0  3  bear
1  4  peeL
2  5  sail


Unnamed: 0,a,b
1,4,peeL


---

## All transformations... method chained!

In [27]:
X, y = (
    pd.read_excel("PollBase-Q3-2023.xlsx", sheet_name="Monthly average")
    
    # Remove empty rows/columns
    .remove_empty()
    
    # Eliminate extraneous columns
    .rename(columns={"Unnamed: 1": "Date"})
    .select_columns("Unnamed*", invert=True)
    .select_columns("Lead", invert=True)
    
    # Auto-clean column names
    .clean_names()
    
    # Filter out rows not pertaining to opinion polls
    .filter_column_isin(
        column_name="date",
        iterable=["GE"],
        complement=True,
    )
    
    # Pivot to long
    .pivot_longer(
        index="date",
        names_to="party",
        values_to="voteshare",
    )
    
    # Encode columns as Categorical
    .encode_categorical(["party"])
    
    # One-hot encoding with category expansion
    .expand_column("party")
    
    # Splitting data into features & target for machine learning
    .get_features_targets(target_column_names=["voteshare"])
)

  warn("""Cannot parse header or footer so it will be ignored""")


In [32]:
X.head()

Unnamed: 0,date,party,bxp,conservative,green,labour,ld,sdp,tig,ukip
0,1983-06-01,conservative,0,1,0,0,0,0,0,0
1,1983-07-01,conservative,0,1,0,0,0,0,0,0
2,1983-08-01,conservative,0,1,0,0,0,0,0,0
3,1983-09-01,conservative,0,1,0,0,0,0,0,0
4,1983-10-01,conservative,0,1,0,0,0,0,0,0


In [33]:
y.head(2)

Unnamed: 0,voteshare
0,43.0
1,44.8


---

<div class="alert alert-block alert-success">
<h1>🎊🎄  Congratulations!  🐍🚀 </h1>

<p>
    👤 This notebook has been made by <a href="https://twitter.com/john_sandall">@John_Sandall</a> and the team at <a href="https://twitter.com/CoefficientData">@CoefficientData</a>. We run training workshops in Python, data science and data engineering.
</p><br/>

<p>
    🎓 If you are interested in registering for our <strong>paid workshops in Python for data science and engineering</strong>, you can <a href="https://coefficient.ai/learn-python">sign up to our workshops mailing list here</a>.
</p><br/>

<p>
    🎬 You can follow my <a href="https://github.com/pydatabristol/workshops/tree/master/workshop_2019_10_28_first_steps"><em>First Steps with Python</em></a> and <a href="https://github.com/pydatabristol/workshops/tree/master/workshop_2020_02_27_first_steps_with_pandas"><em>First Steps with pandas</em></a> workshops for free as part of <a href="https://www.meetup.com/PyData-Bristol/">PyData Bristol's</a> Zero To Hero workshop series. If you'd like to learn more <strong>Jupyter tips &amp; tricks</strong> you may be interested in my event with Ben Sparks from <a href="http://bit.ly/Numberphile_Sub">Numberphile</a> where we explored simulating viral outbreaks with <strong>SIR models</strong>, <strong>interactive Jupyter Widgets</strong> and <strong>animated matplotlib charts</strong> in <a href="https://www.crowdcast.io/e/pydata1/register"><em>Building An Interactive Coronavirus Model In Jupyter w/ Ben Sparks</em></a>.
</p><br/>

<p>
    💼 I am the Founder of data science consultancy <a href="https://coefficient.ai/">Coefficient</a>. If you would like to work with us, our team can help you with your <a href="https://www.youtube.com/watch?v=qBvO3fyl1lk">data science</a>, <a href="https://coefficient.ai/#services-page">software engineering</a> and <a href="https://coefficient.ai/#machine-learning-page">machine learning</a> projects as an on-demand resource. We can also create <a href="https://coefficient.ai/#training-page">bespoke training workshops</a> adapted to your industry, virtual or in-person, with training clients currently including BNP Paribas, EY, the Met Police and the BBC.
</p>

</div>