# Architecture

* /data/poses/{SET}/*.md : one Markdown file per pose, with unique filename/ID. SET = {"cards", "web"}
* /data/poses/{SET}/*.clp : CLIPS rules to build flows. SET = {"cards", "web"}
* /bin/exec-clips : CLI command to execute CLIPS
* /bin/exec-rag: CLI command to executre CLIPS
* /www/json-rpc: takes json-rpc call & executes correct command in bin
* /www/index.html: web interface using jquery.terminal, multiple terminals on a single page
* README.md: info about project
* playground.ipynb

# Notes

["Clean Jypter Notebooks"](https://ploomber.io/blog/clean-nbs/)

# Dataset

## Description

1) A set of yoga poses made with OCR from 50 cards from a Yoga Deck. The deck was scanned, then OCR'd.
2) Initial poses augmented wiht yoga poses crawled from websites like pocketyoga.com and yogajournal.com.

Structure:

* A single Markdown document with a description for each pose.
* A set of transition rules to go from one pose to another.
* A set of benefit and counter-indication rules.
* A set of flow construction rules.

Format:

* Markdown for the individual documents.
* CLIPS / RDF / OWL for the rules and facts. RDF/OWL more popular. Currently no tools to convert between the formats. CLIPS seems better suited for our use case. Our own DSL?

## Sources

https://yogajournal.com/poses
https://pocketyoga.com/poses
https://www.tummee.com/yoga-poses : anti-crawling measures in place
https://www.yogapedia.com/yoga-poses

## Creation Method

* Input: .HEIC picture of each yoga deck card.
* Processing: convert to png, grayscale, invert, increase contrast.
* OCR: convert to text with VNRecognizeTextRequest from MacOS Vision Framework.

In [1]:
#!/usr/bin/python
import chromadb, ollama

# Setup ChromaDB
client = chromadb.PersistentClient(path="./data")

stream = ollama.chat(
    model='llama3',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

What a great question!

The short answer is: scattering of light by tiny particles in the atmosphere.

Here's a more detailed explanation:

When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases like nitrogen (N2) and oxygen (O2). These molecules are much smaller than the wavelength of visible light (around 400-700 nanometers), so they don't absorb or reflect light significantly. However, they do scatter the light in all directions.

The scattering effect is more pronounced for shorter wavelengths, like blue and violet light, which are scattered more than longer wavelengths, like red and orange light. This is known as Rayleigh scattering, named after the British physicist Lord Rayleigh, who first described the phenomenon in the late 19th century.

As a result of this scattering, our eyes perceive the blue light that's being scattered in all directions from the sun, giving the sky its blue appearance. The color we see is actually a combination of the direct sunli

# Dataset Preparation

## OCR With Tesseract

We need to install tesseract for OCR, imagemagick for preprocessing and libeif to read the .HEIC files. Our final correction is to pass the text on to GPT and ask it to correct it because Tesseract detects too many diacretics and weird characters. I tried local ollama3 corrections but those were worse.

In [None]:
brew install tesseract
brew install imagemagick
brew install libheif

In [32]:
!python data/youtube.py

  k = self.parse_starttag(i)


- Hey everyone, welcome
to Yoga With Adriene.
I'm Adriene, and today on
the Foundations of Yoga,
we're learning Hero Pose or Virasana.
This posture's awesome,
right before bedtime,
this is a great set up for meditation.
This is something that you
probably will come across
in a public class and it's a tender one
so I think this is a great one to learn,
especially if you have sensitive knees,
but for everyone who wants to be
super mindful in their practice.
We'll learn how to set it
up with props that you have
on hand at home, as well as without.
So, you're gonna need a block
or a book, something firm,
and then two towels.
So, if you have two Yoga blankets, great,
otherwise, just two towels.
So, take a second to go grab those things
and let's hop on the mat and get started.
(upbeat music)
Alright, let's break this thing down.
So, I have my two towels here.
Notice we have towels
instead of Yoga blankets.
You know, I have a big
sack of Yoga blankets
but chanc

In [26]:
!/bin/zsh data/ocr.sh

Processing ./data/scans/IMG_0466.HEIC
Processing ./data/scans/IMG_0467.HEIC
Processing ./data/scans/IMG_0468.HEIC
Processing ./data/scans/IMG_0469.HEIC
Processing ./data/scans/IMG_0470.HEIC
Processing ./data/scans/IMG_0471.HEIC
Processing ./data/scans/IMG_0472.HEIC
Processing ./data/scans/IMG_0473.HEIC
Processing ./data/scans/IMG_0474.HEIC
Processing ./data/scans/IMG_0475.HEIC
Processing ./data/scans/IMG_0476.HEIC
Processing ./data/scans/IMG_0477.HEIC
Processing ./data/scans/IMG_0478.HEIC
Processing ./data/scans/IMG_0479.HEIC
Processing ./data/scans/IMG_0480.HEIC
Processing ./data/scans/IMG_0481.HEIC
Processing ./data/scans/IMG_0482.HEIC
Processing ./data/scans/IMG_0483.HEIC
Processing ./data/scans/IMG_0484.HEIC
Processing ./data/scans/IMG_0485.HEIC
Processing ./data/scans/IMG_0486.HEIC
Processing ./data/scans/IMG_0487.HEIC
Processing ./data/scans/IMG_0488.HEIC
Processing ./data/scans/IMG_0489.HEIC
Processing ./data/scans/IMG_0490.HEIC
Processing ./data/scans/IMG_0491.HEIC
Processing .