Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re initialize tutorial documentation - #464

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

Conversation

lfunderburk
Copy link
Contributor

reference code in markdown from .py files

Copy link

codspeed-hq bot commented May 15, 2024

CodSpeed Performance Report

Merging #464 will not alter performance

Comparing guide-migration (50cd5eb) with main (d7763d3)

Summary

✅ 6 untouched benchmarks

Copy link
Contributor

@davidselassie davidselassie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Because I have spent too much time fighting with Sphinx, I have a lot of little formatting suggestions to play into the capabilities provided by our Sphinx theme. I think we should use those instead of hand rolled HTML because it will allow us to more easily consistently change all of the formatting in the future via the theme if we need to.

I'd encourage you to read through the "Authoring" articles as part of the MyST docs (like "Admonitions", etc. in the sidebar), the "Content and Features" articles as part of the PyData Theme docs (like "Theme-specific Elements", etc. in the sidebar), and the Sphinx Design docs for all the kinds of structures and visual features that we have access to.

user_event_map = op.map("user_event", inp, user_event)

# Configure the event clock and session windower
event_time_config: EventClock = EventClock(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
event_time_config: EventClock = EventClock(
event_time_config = EventClock(

I don't think this has to be annotated because it's the same type? Does mypy complain otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it complained

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. Not sure why. Fine.

docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
docs/tutorials/search-logs/dataflow.py Show resolved Hide resolved
docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
lfunderburk and others added 15 commits May 15, 2024 17:13
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
@lfunderburk
Copy link
Contributor Author

Screenshot 2024-05-17 at 9 37 41 AM

Hi @davidselassie will this be fixed when it's merged?

Added it literally as you suggested it

:substitutions:
$ pip install bytewax==|version|

@davidselassie
Copy link
Contributor

will this be fixed when it's merged?

As in, will it show the correct released, version? Yes. The whole point is if someone is looking at pre-release docs they get a hint that this doesn't work with the current stable version of Bytewax. Or did you mean something else?

@lfunderburk
Copy link
Contributor Author

As in, will it show the correct released, version? Yes

Cool! Yes, that's what I meant

Ok pr ready for re-review

| Basic, no prior knowledge requirement | Approx. 25 Min | Beginner |


## Your Takeaway
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Forgot that since this is the first "tutorial", we'll need to add the new template path pattern to secondary_sidebar_items in /docs/conf.py to get the right sidebar to show up. Could you change that file to this?

    "secondary_sidebar_items": {
        "api/**": ["page-toc"],
        "guide/**": ["page-toc", "edit-this-page"],
        "tutorials/**": ["page-toc", "edit-this-page"],
    },

There unfortunately isn't a nice way to get "API docs don't have edit this page but everything else does" with a fallback because the builder errors if there's overlap.

docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved

Before we begin, let's import the necessary modules and set up the environment for building the dataflow.

Complete installation - we recommend using a virtual environment to manage your Python dependencies. You can install Bytewax using pip:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we xref to <project:#xref-installing> so if people want more details they can dive deeper?

docs/tutorials/search-logs/dataflow.py Outdated Show resolved Hide resolved
:end-before: end-feed-input
:lineno-match:
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Sessions are Per-User

The core purpose of this step is to break out the user ID as the key of the stream so that we calculate session windows per-user and not in some other way. I think it's important here to mention how state keys work and why they're needed very briefly in the body of this section.

docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
Comment on lines +142 to +149
2. The `calculate_ctr` function will calculate the Click-Through Rate (CTR) for each search session based on the click activity in the session.

```{literalinclude} dataflow.py
:language: python
:start-after: start-calc-ctr
:end-before: end-calc-ctr
:lineno-match:
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved later after we've talked about the windowing steps and defining the sessions since we don't use it until calculating the metrics.


### Returning results

Finally, we can add an output step to our dataflow to return the results of the CTR calculation. This step will emit the CTR for each search session, providing a comprehensive overview of user engagement with search results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to how we couched that the TestingSource is just to showcase how to conceptually get data into the dataflow, we should similarly say that inspect is an operator that lets us dump a stream of data to stdout for prototyping, but isn't actually what you'd use in a production setup.

docs/tutorials/search-logs/index.md Outdated Show resolved Hide resolved
lfunderburk and others added 4 commits May 17, 2024 14:55
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Co-authored-by: David Selassie <david@bytewax.io>
Signed-off-by: Laura Gutierrez Funderburk <lfunderburk@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants