#### Clarusway Python

* [Instructor Landing Page](landing_page.ipynb)
* <a href="https://colab.research.google.com/github/4dsolutions/clarusway_data_analysis/blob/main/Kirby%20Notebooks/DAwPy_sandbox.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>
* [![nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/4dsolutions/clarusway_data_analysis/blob/main/Kirby%20Notebooks/DAwPy_sandbox.ipynb)

<a id="toc"></a>

<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/52136642608/in/photolist-2n4sSUz-2nr8Vrb-2oADYNY" title="Clarusway Banner"><img src="https://live.staticflickr.com/65535/52136642608_bd45cb00a9_b.jpg" width="1024" height="334" alt="Clarusway Banner"/></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

## <p style="background-color:#0D8D99; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Looking Back: The pandas DataFrame<br>Looking Ahead: to SQL</p>

Highly relevant this juncture, when our focus is on table management in pandas, including their combination based on columns-in-common, are the conceptual similarities with SQL (Structured Query Language). The vocabulary (shoptalk) of inner, outer, left and right join, in turn inherited from Set Theory, spans both technologies, pandas and SQL.

In [1]:
import pandas as pd
import numpy as np
from os import path

Pythonistas enjoy the good fortune of having SQLite in the Standard Library. SQLite is a free open source tool that has a role in production, in the office setting, and as an onramp into RDBMS (relational database management systems) more generally.

In [2]:
import sqlite3 as sql  # part of Python Standard Library

Connecting to a database through a context manager has advantages. Connecting to a DB is akin to opening a file, with automatic closure once the code block is done and being exited, with or without unhandled exceptions.

We looked at the context manager pattern in Basic Python. Like the Iterator category, we recognize context managers by the presence of signature magic methods (also known as special names). 

In the case of the Iterator, we look for `__next__` and `__iter__` where the latter might return itself, as eligible for the office of iterator. In the case of a Context Manager, we expect to find `__enter__` and `__exit__`.

We learned how these two methods get triggered: not by directly calling them, but by the "occassions" of entering and exiting code suites set off by the `with` statement, `with` being one of Python's keywords.

Where we most likely encounter the `with` in basic Python is in connection with file object, with opening and closing upon entering and exiting, with access to the Context Manager itself as a presiding object (e.g. cm below), thanks to keyword `as`.  

We say:

```python
    with open("the_file.txt") as cm:
        content = cm.read()
```    

Likewise, our Connector class below wraps a database connection and cursor inside the instance, once `__enter__` has established them as attributes of the presiding object.

```python
    with Connect("airports.db") as db:
        db.list_tables()
``` 

Upon exiting the with suite, the connection closes, and any exceptions get handled or reraised.

The context manager object may be optionally equipped with additional DB-related methods, such as return a tables listing and/or perform a record lookup.

In [3]:
class Connector:

    def __init__(self, conn_name : str):
        """Run when class is called"""
        self.cn_name = conn_name # what file?
        
    def __enter__(self):
        """Run when the context is entered"""
        try:
            self.conn = sql.connect(self.cn_name)
            print("Connection: ", self.conn)
            self.curs = self.conn.cursor()
            # self.list_tables() # optional
        except:
            print("No connection")
            raise

        return self
    
    def lookup(self, table, column, code):
        """
        return the data for column = code condition
        """
        self.curs.execute(f"SELECT * FROM {table} WHERE {column} = ?", (code, ))
        return self.curs.fetchone() # could be None, could be a tuple
    
    def list_tables(self):
        """
        print a listing of all the tables in this db
        https://www.sqlitetutorial.net/sqlite-show-tables/
        """
        self.curs.execute("""SELECT name FROM sqlite_schema  
                            WHERE type ='table' AND name 
                            NOT LIKE 'sqlite_%';
                            """)    
        # loop through whatever table names were found 
        # and filtered and print them out.
        for nm in self.curs.fetchall():
            print(nm)
         
    def __exit__(self, *oops):
        """
        Process exceptions consists of *oops,
        a 3-tuple, we hope filled with Nones because 
        all went well.  Otherwise, exception info.
        return either True or False to determine if
        __exit__ does or does not raise an exception.
        """
        self.conn.close()
        if oops[0]:
            print("An error occurred")
            return False  # raise exception
        return True       # all good

The `airports.db` file contains only one table, Airports. This is a flat file with some information about airports around the world, including their unique IATA code.

A copy of airports.db used here [may be found](https://github.com/4dsolutions/clarusway_data_analysis/blob/main/DVwPY_S6/airports.db) in this Github repo. Download the raw file.

Our purpose here is to bring the data into pandas using `sqlite3` and our Connector, and then review our powers to merge and purge, ending up with some new database files as output, such as a relational `big_airports.db` with lat/long coordinates stored separately, linked by IATA code. We create this table more as a test of pandas than to produce output of much practical value.

In [4]:
path.isfile("airports.db")

True

In [5]:
with Connector("airports.db") as db:
    db.list_tables()

Connection:  <sqlite3.Connection object at 0x108e05120>
('Airports',)


In [8]:
with Connector("airports.db") as db:
    df = pd.read_sql("SELECT * FROM Airports", con = db.conn)
    print(db.lookup("Airports", "iata", "SFO"))
    print(db.lookup("Airports", "iata", "PDX"))

Connection:  <sqlite3.Connection object at 0x1093b1e40>
('SFO', 'US', 'San Francisco International Airport', 'NA', 'airport', 37.615215, -122.38988, 'large', 1)
('PDX', 'US', 'Portland International Airport', 'NA', 'airport', 45.588997, -122.5929, 'large', 1)


In [9]:
df

Unnamed: 0,iata,iso,name,continent,type,lat,lon,size,status
0,UTK,MH,Utirik Airport,OC,airport,11.233333,169.866670,small,1
1,FIV,US,Five Finger CG Heliport,,heliport,,,,1
2,FAK,US,False Island Seaplane Base,,seaplanes,,,,1
3,BWS,US,Blaine Municipal Airport,,closed,,,,0
4,WKK,US,Aleknagik / New Airport,,airport,59.277780,-158.611110,medium,1
...,...,...,...,...,...,...,...,...,...
6721,OHE,CN,Gu-Lian Airport,AS,airport,52.921130,122.420590,medium,1
6722,NDG,CN,Qiqihar Sanjiazi Airport,AS,airport,47.316666,123.916664,medium,1
6723,DLC,CN,Zhoushuizi Airport,AS,airport,38.961020,121.539990,large,1
6724,SHE,CN,Taoxian Airport,AS,airport,41.861084,123.426926,large,1


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6726 entries, 0 to 6725
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   iata       6726 non-null   object 
 1   iso        6726 non-null   object 
 2   name       6247 non-null   object 
 3   continent  6726 non-null   object 
 4   type       6726 non-null   object 
 5   lat        6345 non-null   float64
 6   lon        6345 non-null   float64
 7   size       6546 non-null   object 
 8   status     6726 non-null   int64  
dtypes: float64(2), int64(1), object(6)
memory usage: 473.1+ KB


The description of numeric columns is hardly useful as these consist of either categorical values or latitude / longitude, which it doesn't make a lot of sense to average.

In [11]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
lat,6345.0,17.057357,29.607346,-54.95,-7.288056,18.7,42.183376,82.51667
lon,6345.0,14.750609,88.34545,-179.33333,-68.07361,15.101389,96.201385,179.93333
status,6726.0,0.990039,0.099316,0.0,1.0,1.0,1.0,1.0


However remember `describe` may be directed to attend non-numeric columns as well.

In [13]:
df.describe(include=['O'])

Unnamed: 0,iata,iso,name,continent,type,size
count,6726,6726,6247,6726.0,6726,6546
unique,6632,235,6196,6.0,4,3
top,PRI,US,Santa Maria Airport,,airport,medium
freq,3,682,4,1502.0,6546,3556


In [14]:
df.type.nunique()

4

In [15]:
df.type.unique()

array(['airport', 'heliport', 'seaplanes', 'closed'], dtype=object)

In [16]:
df.groupby(["type"]).agg("count")

Unnamed: 0_level_0,iata,iso,name,continent,lat,lon,size,status
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
airport,6546,6546,6071,6546,6345,6345,6546,6546
closed,67,67,64,67,0,0,0,67
heliport,80,80,79,80,0,0,0,80
seaplanes,33,33,33,33,0,0,0,33


In [17]:
df.status.nunique()

2

In [18]:
df.status.unique()

array([1, 0])

In [19]:
df["size"].nunique()

3

In [20]:
df["size"].unique()

array(['small', None, 'medium', 'large'], dtype=object)

In [21]:
df["size"].value_counts(dropna=False) # show the Nonesdf.

size
medium    3556
small     2485
large      505
None       180
Name: count, dtype: int64

In [22]:
df.dropna(axis=0, how="any", inplace=False)

Unnamed: 0,iata,iso,name,continent,type,lat,lon,size,status
0,UTK,MH,Utirik Airport,OC,airport,11.233333,169.866670,small,1
4,WKK,US,Aleknagik / New Airport,,airport,59.277780,-158.611110,medium,1
6,FOB,US,Fort Bragg Airport,,airport,39.474445,-123.794440,small,1
7,ABP,PG,Atkamba Airport,OC,airport,-6.066667,141.100000,small,1
9,ADC,PG,Andakombe Airport,OC,airport,-7.133333,145.733340,small,1
...,...,...,...,...,...,...,...,...,...
6721,OHE,CN,Gu-Lian Airport,AS,airport,52.921130,122.420590,medium,1
6722,NDG,CN,Qiqihar Sanjiazi Airport,AS,airport,47.316666,123.916664,medium,1
6723,DLC,CN,Zhoushuizi Airport,AS,airport,38.961020,121.539990,large,1
6724,SHE,CN,Taoxian Airport,AS,airport,41.861084,123.426926,large,1


In [23]:
df2 = df.dropna(axis=0, how="any", inplace=False)

In [24]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5896 entries, 0 to 6725
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   iata       5896 non-null   object 
 1   iso        5896 non-null   object 
 2   name       5896 non-null   object 
 3   continent  5896 non-null   object 
 4   type       5896 non-null   object 
 5   lat        5896 non-null   float64
 6   lon        5896 non-null   float64
 7   size       5896 non-null   object 
 8   status     5896 non-null   int64  
dtypes: float64(2), int64(1), object(6)
memory usage: 460.6+ KB


In [25]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

In [26]:
big = df2[(df["type"] == "airport") & (df["size"] == "large")].reset_index(drop=True)

In [27]:
medium = df2[(df["type"] == "airport") & (df["size"] == "medium")].reset_index(drop=True)

In [28]:
small = df2[(df["type"] == "airport") & (df["size"] == "small")].reset_index(drop=True)

In [29]:
df2.loc[:, ["iata", "iso", "name"]]

Unnamed: 0,iata,iso,name
0,UTK,MH,Utirik Airport
4,WKK,US,Aleknagik / New Airport
6,FOB,US,Fort Bragg Airport
7,ABP,PG,Atkamba Airport
9,ADC,PG,Andakombe Airport
...,...,...,...
6721,OHE,CN,Gu-Lian Airport
6722,NDG,CN,Qiqihar Sanjiazi Airport
6723,DLC,CN,Zhoushuizi Airport
6724,SHE,CN,Taoxian Airport


In [30]:
big = big.loc[:, ["iata", "iso", "name"]]
medium = medium.loc[:, ["iata", "iso", "name"]]
small = small.loc[:, ["iata", "iso", "name"]]
latlong = df2.loc[: , ["iata", "continent", "lat", "lon"]]

In [31]:
big.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 473 entries, 0 to 472
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   iata    473 non-null    object
 1   iso     473 non-null    object
 2   name    473 non-null    object
dtypes: object(3)
memory usage: 11.2+ KB


In [32]:
medium.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3222 entries, 0 to 3221
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   iata    3222 non-null   object
 1   iso     3222 non-null   object
 2   name    3222 non-null   object
dtypes: object(3)
memory usage: 75.6+ KB


In [33]:
small.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2201 entries, 0 to 2200
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   iata    2201 non-null   object
 1   iso     2201 non-null   object
 2   name    2201 non-null   object
dtypes: object(3)
memory usage: 51.7+ KB


In [34]:
latlong.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5896 entries, 0 to 6725
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   iata       5896 non-null   object 
 1   continent  5896 non-null   object 
 2   lat        5896 non-null   float64
 3   lon        5896 non-null   float64
dtypes: float64(2), object(2)
memory usage: 230.3+ KB


In [35]:
big.join(latlong.set_index("iata"), on="iata", how="inner", sort=True) # right index set to iata

Unnamed: 0,iata,iso,name,continent,lat,lon
71,AAL,DK,Aalborg Airport,EU,57.086550,9.872241
97,ABQ,US,Albuquerque International Sunport Airport,,35.049625,-106.617195
16,ABV,NG,Nnamdi Azikiwe International Airport,AF,9.004614,7.270447
54,ABZ,GB,Aberdeen Dyce Airport,EU,57.200253,-2.204186
298,ACA,MX,General Juan N Alvarez International Airport,,16.762403,-99.754590
...,...,...,...,...,...,...
10,YXU,CA,London Airport,,43.028020,-81.149650
11,YYC,CA,Calgary International Airport,,51.131393,-114.010550
12,YYJ,CA,Victoria International Airport,,48.640266,-123.430960
13,YYZ,CA,Lester B. Pearson International Airport,,43.681583,-79.611460


In [36]:
pd.merge(big, latlong, how='left', on='iata', sort=True)

Unnamed: 0,iata,iso,name,continent,lat,lon
0,AAL,DK,Aalborg Airport,EU,57.086550,9.872241
1,ABQ,US,Albuquerque International Sunport Airport,,35.049625,-106.617195
2,ABV,NG,Nnamdi Azikiwe International Airport,AF,9.004614,7.270447
3,ABZ,GB,Aberdeen Dyce Airport,EU,57.200253,-2.204186
4,ACA,MX,General Juan N Alvarez International Airport,,16.762403,-99.754590
...,...,...,...,...,...,...
470,YXU,CA,London Airport,,43.028020,-81.149650
471,YYC,CA,Calgary International Airport,,51.131393,-114.010550
472,YYJ,CA,Victoria International Airport,,48.640266,-123.430960
473,YYZ,CA,Lester B. Pearson International Airport,,43.681583,-79.611460


In [37]:
big[big.duplicated('iata')]

Unnamed: 0,iata,iso,name
421,HYD,IN,Rajiv Gandhi Airport


In [38]:
big[big.iata == "HYD"]

Unnamed: 0,iata,iso,name
419,HYD,IN,"Rajiv Gandhi International Airport, Shamshabad"
421,HYD,IN,Rajiv Gandhi Airport


In [39]:
big.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 473 entries, 0 to 472
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   iata    473 non-null    object
 1   iso     473 non-null    object
 2   name    473 non-null    object
dtypes: object(3)
memory usage: 11.2+ KB


In [40]:
big = big.drop(index=421)

In [41]:
big.info()

<class 'pandas.core.frame.DataFrame'>
Index: 472 entries, 0 to 472
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   iata    472 non-null    object
 1   iso     472 non-null    object
 2   name    472 non-null    object
dtypes: object(3)
memory usage: 14.8+ KB


In [42]:
latlong.size

23584

In [43]:
latlong.duplicated('iata')==False

0        True
4        True
6        True
7        True
9        True
        ...  
6721    False
6722     True
6723     True
6724     True
6725     True
Length: 5896, dtype: bool

In [44]:
df3 = latlong[latlong.duplicated()==False]

In [45]:
df3

Unnamed: 0,iata,continent,lat,lon
0,UTK,OC,11.233333,169.866670
4,WKK,,59.277780,-158.611110
6,FOB,,39.474445,-123.794440
7,ABP,OC,-6.066667,141.100000
9,ADC,OC,-7.133333,145.733340
...,...,...,...,...
6720,MDG,AS,44.534943,129.583850
6722,NDG,AS,47.316666,123.916664
6723,DLC,AS,38.961020,121.539990
6724,SHE,AS,41.861084,123.426926


In [46]:
df3.duplicated().value_counts()

False    5857
Name: count, dtype: int64

In [47]:
df3[df3.iata == 'YAX']

Unnamed: 0,iata,continent,lat,lon
315,YAX,,53.251945,-89.565


In [48]:
big_airports = pd.merge(big, df3, how='left', on='iata', sort=True)
big_airports

Unnamed: 0,iata,iso,name,continent,lat,lon
0,AAL,DK,Aalborg Airport,EU,57.086550,9.872241
1,ABQ,US,Albuquerque International Sunport Airport,,35.049625,-106.617195
2,ABV,NG,Nnamdi Azikiwe International Airport,AF,9.004614,7.270447
3,ABZ,GB,Aberdeen Dyce Airport,EU,57.200253,-2.204186
4,ACA,MX,General Juan N Alvarez International Airport,,16.762403,-99.754590
...,...,...,...,...,...,...
467,YXU,CA,London Airport,,43.028020,-81.149650
468,YYC,CA,Calgary International Airport,,51.131393,-114.010550
469,YYJ,CA,Victoria International Airport,,48.640266,-123.430960
470,YYZ,CA,Lester B. Pearson International Airport,,43.681583,-79.611460


In [49]:
big

Unnamed: 0,iata,iso,name
0,TJP,PR,Areopuerto Internacional Michael Gonzalez
1,POM,PG,Port Moresby Jacksons International Airport
2,KEF,IS,Keflavik International Airport
3,YEG,CA,Edmonton International Airport
4,YHZ,CA,Halifax / Stanfield International Airport
...,...,...,...
468,KWE,CN,Longdongbao Airport
469,CTU,CN,Chengdu Shuangliu International Airport
470,HRB,CN,Taiping Airport
471,DLC,CN,Zhoushuizi Airport


In [50]:
df3

Unnamed: 0,iata,continent,lat,lon
0,UTK,OC,11.233333,169.866670
4,WKK,,59.277780,-158.611110
6,FOB,,39.474445,-123.794440
7,ABP,OC,-6.066667,141.100000
9,ADC,OC,-7.133333,145.733340
...,...,...,...,...
6720,MDG,AS,44.534943,129.583850
6722,NDG,AS,47.316666,123.916664
6723,DLC,AS,38.961020,121.539990
6724,SHE,AS,41.861084,123.426926


In [51]:
big_airports.loc[:, ['iata', 'iso', 'name']]

Unnamed: 0,iata,iso,name
0,AAL,DK,Aalborg Airport
1,ABQ,US,Albuquerque International Sunport Airport
2,ABV,NG,Nnamdi Azikiwe International Airport
3,ABZ,GB,Aberdeen Dyce Airport
4,ACA,MX,General Juan N Alvarez International Airport
...,...,...,...
467,YXU,CA,London Airport
468,YYC,CA,Calgary International Airport
469,YYJ,CA,Victoria International Airport
470,YYZ,CA,Lester B. Pearson International Airport


Per [the documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html) for `pandas.DataFrame.to_sql`, this method requires an already-open connection to the database in question, suggesting [SQLAlchemy](https://docs.sqlalchemy.org/en/20/) and or SQLite may be used, the former being a 3rd party Python database API, and the later what we're using here, direct from the Standard Library.

The "tree" or "river delta" diagram below suggest two major user communities, that of website development and that of data science, both have their roots in talking to databases.

<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/24749338009/in/album-72177720296706479" title="Pythonic Ecosystem"><img src="https://live.staticflickr.com/1624/24749338009_537ab57eb1_w.jpg" width="300" height="400" alt="Pythonic Ecosystem"/></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

In addition to a connection object (`db` below), the `to_sql` method expects a table name. A database may contain any number of individual tables.

In the code cell below, we might be creating big_airports.db for the first time, or it might be an existing file. Either way, we take our flat file, `big_airports`, and write it out in two tables, Airports and Coords.

In [52]:
with Connector('big_airports.db') as db:
    big_airports.loc[:, ['iata', 'iso', 'name']].to_sql('Airports', db.conn, if_exists='replace')
    big_airports.loc[:, ['iata', 'continent', 'lat', 'lon']].to_sql('Coords', db.conn, if_exists='replace')

Connection:  <sqlite3.Connection object at 0x1093b2a70>


As a check, lets reconstitute a flat file pairing airports with corresponding coordinates based on IATA code. An [SQLite inner join](https://www.sqlitetutorial.net/sqlite-inner-join/) will accomplish this.

In [53]:
sql_stmnt = """
SELECT 
    Airports.iata,
    iso,
    name,
    Coords.lat,
    Coords.lon,
    Coords.continent
FROM 
    Airports
INNER JOIN Coords ON 
    Coords.iata = Airports.iata
"""

with Connector("big_airports.db") as db:
    airports = pd.read_sql(sql_stmnt, con = db.conn)
    db.list_tables()
    print(db.lookup("Airports", "iata", "SFO"))
    print(db.lookup("Airports", "iata", "PDX"))

Connection:  <sqlite3.Connection object at 0x1093b1e40>
('Airports',)
('Coords',)
(372, 'SFO', 'US', 'San Francisco International Airport')
(331, 'PDX', 'US', 'Portland International Airport')


In [None]:
airports

## EXPLORING UNICODE

<a data-flickr-embed="true" href="https://www.flickr.com/photos/kirbyurner/29832307687/in/album-72177720296706479" title="Unicode on Windows"><img src="https://live.staticflickr.com/1847/29832307687_0aee594ec5_w.jpg" width="400" height="276" alt="Unicode on Windows"/></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

Through unicode we may access the emojis, which in turn may be used to craft practice dataframes, for learning purposes. Unicode itself, as a topic, permeates our technology, especially when it comes to natural language processing, which is at the heart of Machine Learning (ML), for example in the form of LLMs (large language models, used to drive chat bots).

In [None]:
import unicodedata as ud

In [None]:
# help(ud)

In [None]:
smiley = ud.lookup("Smiling Face with Smiling Eyes")

In [None]:
smiley

In [None]:
ord(smiley)

In [None]:
ud.name(smiley)

In [None]:
"\N{SMILING FACE WITH SMILING EYES}"

In [None]:
"\N{HOT DOG}"

In [None]:
ord("\N{HOT DOG}")

In [None]:
start = hex(ord("\N{HOT DOG}")) # base 16 as a string
start

In [None]:
dec_start = int(start, base=16) # going back and forth between bases
dec_start

Our first range of emoji starts with hot dog (🌭) and ends with popcorn (🍿).

A great resource for studying the emoji is [at Wikipedia](https://en.wikipedia.org/wiki/List_of_emojis).

In [None]:
"\N{POPCORN}" # the Unicode escape symbol

In [None]:
'🍿'.encode('utf-8')

In [None]:
b'\xf0\x9f\x8d\xbf'.decode()

In [None]:
stop = hex(ord("\N{POPCORN}"))
stop

In [None]:
dec_stop = int(stop, base=16)
dec_stop

In [None]:
code_range = np.arange(dec_start, dec_stop+1)

In [None]:
foods = [chr(codepoint) 
         for codepoint in 
         code_range]

In [None]:
print(foods)

In [None]:
code_range2 = np.arange(0x1f950, 0x1f96f+1)
foods2 = [chr(codepoint) 
         for codepoint in 
         code_range2]
print(foods2)

In [None]:
all_foods = foods + foods2

In [None]:
df_foods = pd.DataFrame({"NAME": [ud.name(food) for food in all_foods],
              "GLYPH": all_foods,
              "CODEPOINT": [ord(food) for food in all_foods]})

In [None]:
df_foods.sort_values("CODEPOINT")

In [None]:
df_foods = df_foods.set_index("GLYPH")

In [None]:
df_foods

In [None]:
df_foods.loc['🍯':'🍵',:]

*Note*:

You may also embed YouTubes in markdown cells. Notebooks in this repo almost exclusively use the code cell method.

Example:

[![Less Than Jake — Scott Farcas Takes It On The Chin](https://img.youtube.com/vi/PYCxct2e0zI/0.jpg)](https://www.youtube.com/watch?v=PYCxct2e0zI)

[Markdown Guide](https://www.markdownguide.org/hacks/)
