In [2]:
url = 'https://github.com/alexeygrigorev/ai-engineering-buildcamp-code/releases/download/math-book-pages/math-book-pages.tar.gz'

In [3]:
import requests

In [6]:
tag_gz_content = requests.get(url).content

In [7]:
tag_gz_content[:10]

b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

In [9]:
import tarfile
import io

archive = tarfile.open(fileobj=io.BytesIO(tag_gz_content), mode='r:gz')

In [17]:
for file_info in archive.getmembers():
    filename = file_info.name
    
    if not filename.endswith('.json'):
        continue 

    break

In [18]:
file = archive.extractfile(filename)
content = file.read()

In [20]:
import json

In [27]:
filename = 'output/page_092.json'
file = archive.extractfile(filename)
content = file.read()
book_page = json.loads(content)

In [26]:
from models import PageResponse

In [29]:
page_response = PageResponse.model_validate(book_page)

In [32]:
page_response = PageResponse.model_validate_json(content)

In [39]:
def block_to_string(block):
    lines = []

    if block.type == 'text':
        lines.append(block.text)

    elif block.type == 'equation':
        lines.append(f'$${block.latex}$$')

    elif block.type == 'figure':
        lines.append(block.caption or '')
        lines.append(block.description or '')
        lines.append(f'Fig. {block.figure_number}')

    else:
        lines.append(str(block))

    return "\n".join(lines)

def blocks_to_string(blocks):
    lines = []

    for block in blocks:
        lines.append(block_to_string(block))

    return "\n".join(lines)

In [40]:
blocks_to_string(page_response.page.blocks)

"Dividing through by $c$: \n$$ c e^t + e^{-t} = (y+c),$$ \nthat is, \n$$ e^{2t} = (y+c)^2 - (c^2 + e^{-2t}). $$ \n\nIf now the axis of $y$ be shifted downwards a distance $c$, then the new ordinate $Y = y+c$ and $Y = \x0crac{(y+c)^2 - (c^2 + e^{-2t})}{e^{-t}} = \x0crac{z + c}{e^{-t}}$.\n\nThe figure illustrates a parabolic curve representing the form of a cable under the influence of gravity. The axes labeled 'True Scale' and 'Construct Scale' show the relationship between the vertical and horizontal distances, with points marked to indicate specific values along the cable's curve.\nFig. 1\nAgain, since $Y=y+c$ \n$$ \x0crac{dY}{dx} = \x0crac{(y+c) + c}{c \text{h} x} $$ \nand also \n$$ \x0crac{dY}{dx} = \x0crac{(y+c) + c}{\text{s} h x \text{s} h}},$$ \nThen \n$$ \x0crac{d^{2}Y}{dx^{2}} = \x0crac{dY}{dx} $$ \nhence \n$$ \x0crac{z}{c} = \text{s}h x or s = \x0crac{c}{\text{s}h x}.$$"

In [41]:
documents = []

for file_info in archive.getmembers():
    filename = file_info.name
    
    if not filename.endswith('.json'):
        continue 

    file = archive.extractfile(filename)
    content = file.read()
    page_response = PageResponse.model_validate_json(content)

    page = page_response.page

    content = blocks_to_string(page.blocks)

    doc = {
        'filename': filename,
        'content': content,
    }

    documents.append(doc)

In [44]:
documents[100]

{'filename': 'output/page_100.json',
 'content': 'If $z$ is a function of $x$ and $y$, i.e., $z = f(x, y)$, the total differential $dz$ is obtained from the partial differentials $dx$ and $dy$ by the use of the following relation:\n$$dz = \\frac{\\partial z}{\\partial x} dx + \\frac{\\partial z}{\\partial y} dy$$\nThe reason for this is more clearly seen if we work from the fundamental idea of $x$, $y$, and introduce the actually measurable quantities like $x$ and $y$.\n[UNRECOGNIZED TEXT]\nThe figure illustrates the relationship between the changes in the variables $x$ and $y$ and how these affect the change in $z$. It shows a point $P$ on a surface and the path of movement to a new position $Q$.\nFig. 21\nThus:\n$$y = \\text{change in } y \\text{ due to the change in } x$$\nThe change in $z$ due to a change in $x$ can be measured by the product of the change in $x$ multiplied by the rate at which $z$ is changing with regard to $x$; and that fact can be better illustrated by referring

In [45]:
archive.close()

In [46]:
with tarfile.open(fileobj=io.BytesIO(tag_gz_content), mode='r:gz') as archive:
    ...

In [50]:
from minsearch import Index

index = Index(text_fields=['content'])
index.fit(documents)

<minsearch.minsearch.Index at 0x162c1e9bc50>

In [53]:
index.search('Centre of Gravity and Centroid', num_results=3)

[{'filename': 'output/page_228.json',
  'content': 'Centre of Gravity and Centroid. — The Centre of Gravity of a body is that point at which the resultant of all the forces acting on the body may be supposed to act, i.e., it is the balancing point. The term Centroid has been applied in place of C. G. when dealing with areas; and our work here is more confined towards areas where it will be convenient to adopt the term centroid.\nFrom the definition of its location, the sum of the weights of a body may be supported at its C. G. and it is shown in Mechanics this property is most useful. Thus, movements of complex systems of weights may be reduced to the movement of a single weight at the centroid. Therefore, to find the centroid of an area, it is necessary to find the position of the centroid of the bending moment.\nFigure 38. Centre of Gravity or Centroid.\nThis figure illustrates the concept of the centre of gravity or centroid for various shapes, indicating their geometric properties 

In [55]:
import sys
sys.path.append('..')

In [59]:
import rag
from openai import OpenAI

openai_client = OpenAI()

In [60]:
book_rag = rag.RAG(
    index=index,
    llm_client=openai_client
)

In [62]:
response = book_rag.rag('double sum curve')

In [64]:
print(response.answer)

The term "double sum curve" refers to a method for finding the centroid vertical of an area defined by a sum curve. This method is primarily graphic and involves the use of polar distances to establish relationships between various points on the curve and its centroid.

To utilize the double sum curve method, one would extend the original curve to determine necessary dimensions and facilitate the calculation of areas related to the centroid's position. Specific procedures include defining the areas represented by the original sum curve and establishing the centroids based on the areas involved.

For further details, a visual diagram usually accompanies this method, illustrating the relationships and procedural steps involved in defining the centroid vertical effectively.

Overall, the double sum curve technique is useful but can be lengthy, making it less practical for quick calculations of centroids.
