Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cambridge latex style resources #2187

Merged
merged 29 commits into from
Aug 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
4c9fdc7
fix the dataset download issue of sentiment analysis
cheungdaven Jun 23, 2022
0f6784a
triggering evaluation of sentiment-analysis-cnn/rnn
cheungdaven Jun 23, 2022
0c6d5b4
Merge branch 'd2l-ai:master' into master
cheungdaven Jun 28, 2022
8ea5644
Merge branch 'd2l-ai:master' into master
cheungdaven Jun 30, 2022
dae85cd
Merge branch 'd2l-ai:master' into master
cheungdaven Jul 6, 2022
accf55c
add cambridge latex style resources
cheungdaven Jul 6, 2022
79fb890
add the original zip file of cambridge style
cheungdaven Jul 6, 2022
bd0f496
Merge branch 'd2l-ai:master' into latex
cheungdaven Jul 8, 2022
2ac58ef
add url for PT1.zip
cheungdaven Jul 8, 2022
25db095
enable cambridge latex style
cheungdaven Jul 8, 2022
2542529
disable cambridge latex style
cheungdaven Jul 8, 2022
9f641fe
Merge branch 'd2l-ai:master' into latex
cheungdaven Jul 15, 2022
b60694e
fix compatibility and style issues
cheungdaven Jul 15, 2022
044da7e
Merge branch 'master' into latex
astonzhang Jul 28, 2022
becfaa3
Delete sphinx.sty
cheungdaven Aug 3, 2022
798bbc9
Update sphinxlatexobjects.sty
cheungdaven Aug 3, 2022
d51b05f
Update sphinxlatexlists.sty
cheungdaven Aug 3, 2022
bd0b665
Update sphinxpackagefootnote.sty
cheungdaven Aug 3, 2022
62d0bb8
Delete sphinxlatexstyletext.sty
cheungdaven Aug 3, 2022
6e4a144
Update config.ini
cheungdaven Aug 3, 2022
c8a039a
Update config.ini
cheungdaven Aug 3, 2022
a081d32
test cambridge style
cheungdaven Aug 3, 2022
6d17146
Update Jenkinsfile
cheungdaven Aug 3, 2022
2ea88a0
replace special characters
cheungdaven Aug 4, 2022
8cc9879
replace special character ~
cheungdaven Aug 4, 2022
f0b6acb
replace special character ~
cheungdaven Aug 4, 2022
3335504
Update mf.md
cheungdaven Aug 4, 2022
79c122c
Update sphinxlatexlists.sty
cheungdaven Aug 4, 2022
778f79c
disable cambridge style
cheungdaven Aug 4, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions chapter_computational-performance/hardware.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ We will start by looking at computers. Then we will zoom in to look more careful
![Latency Numbers that every programmer should know.](../img/latencynumbers.png)
:label:`fig_latencynumbers`

Impatient readers may be able to get by with :numref:`fig_latencynumbers`. It is taken from Colin Scott's [interactive post](https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html) that gives a good overview of the progress over the past decade. The original numbers are due to Jeff Dean's [Stanford talk from 2010](https://static.googleusercontent.com/media/research.google.com/en//people/jeff/Stanford-DL-Nov-2010.pdf).
The discussion below explains some of the rationale for these numbers and how they can guide us in designing algorithms. The discussion below is very high level and cursory. It is clearly *no substitute* for a proper course but rather just meant to provide enough information for a statistical modeler to make suitable design decisions. For an in-depth overview of computer architecture we refer the reader to :cite:`Hennessy.Patterson.2011` or a recent course on the subject, such as the one by [Arste Asanovic](http://inst.eecs.berkeley.edu/~cs152/sp19/).
Impatient readers may be able to get by with :numref:`fig_latencynumbers`. It is taken from Colin Scott's [interactive post](https://people.eecs.berkeley.edu/%7Ercs/research/interactive_latency.html) that gives a good overview of the progress over the past decade. The original numbers are due to Jeff Dean's [Stanford talk from 2010](https://static.googleusercontent.com/media/research.google.com/en//people/jeff/Stanford-DL-Nov-2010.pdf).
The discussion below explains some of the rationale for these numbers and how they can guide us in designing algorithms. The discussion below is very high level and cursory. It is clearly *no substitute* for a proper course but rather just meant to provide enough information for a statistical modeler to make suitable design decisions. For an in-depth overview of computer architecture we refer the reader to :cite:`Hennessy.Patterson.2011` or a recent course on the subject, such as the one by [Arste Asanovic](http://inst.eecs.berkeley.edu/%7Ecs152/sp19/).

## Computers

Expand All @@ -35,7 +35,7 @@ At its most basic memory is used to store data that needs to be readily accessib
While these numbers are impressive, indeed, they only tell part of the story. When we want to read a portion from memory we first need to tell the memory module where the information can be found. That is, we first need to send the *address* to RAM. Once this is accomplished we can choose to read just a single 64 bit record or a long sequence of records. The latter is called *burst read*. In a nutshell, sending an address to memory and setting up the transfer takes approximately 100 ns (details depend on the specific timing coefficients of the memory chips used), every subsequent transfer takes only 0.2 ns. In short, the first read is 500 times as expensive as subsequent ones! Note that we could perform up to 10,000,000 random reads per second. This suggests that we avoid random memory access as far as possible and use burst reads (and writes) instead.

Matters are a bit more complex when we take into account that we have multiple *banks*. Each bank can read memory largely independently. This means two things.
On the one hand, the effective number of random reads is up to 4 times higher, provided that they are spread evenly across memory. It also means that it is still a bad idea to perform random reads since burst reads are 4 times faster, too. On the other hand, due to memory alignment to 64 bit boundaries it is a good idea to align any data structures with the same boundaries. Compilers do this pretty much [automatically](https://en.wikipedia.org/wiki/Data_structure_alignment) when the appropriate flags are set. Curious readers are encouraged to review a lecture on DRAMs such as the one by [Zeshan Chishti](http://web.cecs.pdx.edu/~zeshan/ece585_lec5.pdf).
On the one hand, the effective number of random reads is up to 4 times higher, provided that they are spread evenly across memory. It also means that it is still a bad idea to perform random reads since burst reads are 4 times faster, too. On the other hand, due to memory alignment to 64 bit boundaries it is a good idea to align any data structures with the same boundaries. Compilers do this pretty much [automatically](https://en.wikipedia.org/wiki/Data_structure_alignment) when the appropriate flags are set. Curious readers are encouraged to review a lecture on DRAMs such as the one by [Zeshan Chishti](http://web.cecs.pdx.edu/%7Ezeshan/ece585_lec5.pdf).

GPU memory is subject to even higher bandwidth requirements since they have many more processing elements than CPUs. By and large there are two options to address them. The first is to make the memory bus significantly wider. For instance, NVIDIA's RTX 2080 Ti has a 352-bit-wide bus. This allows for much more information to be transferred at the same time. Second, GPUs use specific high-performance memory. Consumer-grade devices, such as NVIDIA's RTX and Titan series typically use [GDDR6](https://en.wikipedia.org/wiki/GDDR6_SDRAM) chips with over 500 GB/s aggregate bandwidth. An alternative is to use HBM (high bandwidth memory) modules. They use a very different interface and connect directly with GPUs on a dedicated silicon wafer. This makes them very expensive and their use is typically limited to high-end server chips, such as the NVIDIA Volta V100 series of accelerators. Quite unsurprisingly, GPU memory is generally *much* smaller than CPU memory due to the higher cost of the former. For our purposes, by and large their performance characteristics are similar, just a lot faster. We can safely ignore the details for the purpose of this book. They only matter when tuning GPU kernels for high throughput.

Expand Down
2 changes: 1 addition & 1 deletion chapter_convolutional-modern/vgg.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ individual neurons to whole layers,
and now to blocks, repeating patterns of layers.

The idea of using blocks first emerged from the
[Visual Geometry Group](http://www.robots.ox.ac.uk/~vgg/) (VGG)
[Visual Geometry Group](http://www.robots.ox.ac.uk/%7Evgg/) (VGG)
at Oxford University,
in their eponymously-named *VGG* network :cite:`Simonyan.Zisserman.2014`.
It is easy to implement these repeated structures in code
Expand Down
2 changes: 1 addition & 1 deletion chapter_introduction/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -706,7 +706,7 @@ the one that you are going to use for your decision.
Assume that you find a beautiful mushroom in your backyard
as shown in :numref:`fig_death_cap`.

![Death cap---do not eat!](../img/death-cap.jpg)
![Death cap - do not eat!](../img/death-cap.jpg)
:width:`200px`
:label:`fig_death_cap`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ as a text classification task,
which transforms a varying-length text sequence
into a fixed-length text category.
In this chapter,
we will use Stanford's [large movie review dataset](https://ai.stanford.edu/~amaas/data/sentiment/)
we will use Stanford's [large movie review dataset](https://ai.stanford.edu/%7Eamaas/data/sentiment/)
for sentiment analysis.
It consists of a training set and a testing set,
either containing 25000 movie reviews downloaded from IMDb.
Expand Down
2 changes: 1 addition & 1 deletion chapter_optimization/sgd.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ lr = polynomial_lr
d2l.show_trace_2d(f, d2l.train_2d(sgd, steps=50, f_grad=f_grad))
```

There exist many more choices for how to set the learning rate. For instance, we could start with a small rate, then rapidly ramp up and then decrease it again, albeit more slowly. We could even alternate between smaller and larger learning rates. There exists a large variety of such schedules. For now let's focus on learning rate schedules for which a comprehensive theoretical analysis is possible, i.e., on learning rates in a convex setting. For general nonconvex problems it is very difficult to obtain meaningful convergence guarantees, since in general minimizing nonlinear nonconvex problems is NP hard. For a survey see e.g., the excellent [lecture notes](https://www.stat.cmu.edu/~ryantibs/convexopt-F15/lectures/26-nonconvex.pdf) of Tibshirani 2015.
There exist many more choices for how to set the learning rate. For instance, we could start with a small rate, then rapidly ramp up and then decrease it again, albeit more slowly. We could even alternate between smaller and larger learning rates. There exists a large variety of such schedules. For now let's focus on learning rate schedules for which a comprehensive theoretical analysis is possible, i.e., on learning rates in a convex setting. For general nonconvex problems it is very difficult to obtain meaningful convergence guarantees, since in general minimizing nonlinear nonconvex problems is NP hard. For a survey see e.g., the excellent [lecture notes](https://www.stat.cmu.edu/%7Eryantibs/convexopt-F15/lectures/26-nonconvex.pdf) of Tibshirani 2015.



Expand Down
2 changes: 1 addition & 1 deletion chapter_recommender-systems/mf.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Matrix Factorization

Matrix Factorization :cite:`Koren.Bell.Volinsky.2009` is a well-established algorithm in the recommender systems literature. The first version of matrix factorization model is proposed by Simon Funk in a famous [blog
post](https://sifter.org/~simon/journal/20061211.html) in which he described the idea of factorizing the interaction matrix. It then became widely known due to the Netflix contest which was held in 2006. At that time, Netflix, a media-streaming and video-rental company, announced a contest to improve its recommender system performance. The best team that can improve on the Netflix baseline, i.e., Cinematch), by 10 percent would win a one million USD prize. As such, this contest attracted
post](https://sifter.org/%7Esimon/journal/20061211.html) in which he described the idea of factorizing the interaction matrix. It then became widely known due to the Netflix contest which was held in 2006. At that time, Netflix, a media-streaming and video-rental company, announced a contest to improve its recommender system performance. The best team that can improve on the Netflix baseline, i.e., Cinematch), by 10 percent would win a one million USD prize. As such, this contest attracted
a lot of attention to the field of recommender system research. Subsequently, the grand prize was won by the BellKor's Pragmatic Chaos team, a combined team of BellKor, Pragmatic Theory, and BigChaos (you do not need to worry about these algorithms now). Although the final score was the result of an ensemble solution (i.e., a combination of many algorithms), the matrix factorization algorithm played a critical role in the final blend. The technical report of the Netflix Grand Prize solution :cite:`Toscher.Jahrer.Bell.2009` provides a detailed introduction to the adopted model. In this section, we will dive into the details of the matrix factorization model and its implementation.


Expand Down
3 changes: 2 additions & 1 deletion config.ini
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ release = 1.0.0-alpha0
notebooks = *.md */*.md

# A list of files that will be copied to the build folder.
resources = img/ d2l/ d2l.bib setup.py
resources = img/ d2l/ d2l.bib setup.py latex_style/

# Files that will be skipped.
exclusions = README.md STYLE_GUIDE.md INFO.md CODE_OF_CONDUCT.md CONTRIBUTING.md contrib/*md
Expand Down Expand Up @@ -57,6 +57,7 @@ include_css = static/d2l.css

# The file used to post-process the generated tex file.
post_latex = ./static/post_latex/main.py
latex_url = https://d2l-webdata.s3.us-west-2.amazonaws.com/latex-styles/PT1.zip

latex_logo = static/logo.png
main_font = Source Serif Pro
Expand Down
298 changes: 298 additions & 0 deletions latex_style/PT1header.eps

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading