Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion to Markdown #5

Closed
norvig opened this issue Feb 16, 2018 · 30 comments
Closed

Conversion to Markdown #5

norvig opened this issue Feb 16, 2018 · 30 comments

Comments

@norvig
Copy link
Contributor

norvig commented Feb 16, 2018

A few ideas:

  • Hopefully, pandoc can be used to convert to markdown. Perhaps it won't be perfect and there will need to be some hand-editing.
  • I'm worried about math equations. Will MathJax work? Some relevant research here, here, here.
    There's mathTex for rendering images. Or this:
    equation

-Or we could put exercises in Jupyter notebooks (see here).
-Each exercise will need a persistent name/number. We'll need a convention for this.

@Nalinc
Copy link
Collaborator

Nalinc commented Feb 16, 2018

+1 for using Jupyter notebooks.
The markdown parser included in the Jupyter is MathJax-aware. This will cover the mathematical equations.

Further, Jupyter will also give us access to nbconvert, which allows conversion of notebook files (.ipynb) into other static formats (Markdown, HTML, LaTeX, PDF). nbconvert also use pandoc internally.

There are similar initiatives like Python Textbook Companion Project, which uses Jupyter notebooks to put popular textbooks on the web (see [1] for example).

[1] Jupyter notebook for "Programming in C by Stephen G. Kochan", http://tbc-python.fossee.in/book-details/36/

@Dragneel7
Copy link
Contributor

Use of Jupyter notebooks is a good idea. I tried pandoc and the results were not very good. Please correct me if I am wrong, the aim of the project is to transfer the exercises to an online platform so why not make a website for this which contains all the exercises. This way we will be able to achieve the functionality of ranking the exercises, portals for the teachers and students.

@sambitdash
Copy link

I just used a script to convert all the files to Markdown. You can see the files at:
https://github.com/sambitdash/aima-exercises/tree/master/md
If you think it will help to use these as a starting point I can submit a pull request.

@Nalinc
Copy link
Collaborator

Nalinc commented Feb 19, 2018

This is a great start @sambitdash
However, conversion of LaTex to Markdown is challenging in the sense that we need consistent and beautiful results.
Also, some files seem to be missing. You might want to tweak the script you used, or play around with Pandoc settings to get satisfactory Markdowns.

@sambitdash
Copy link

sambitdash commented Feb 19, 2018

@Nalinc your expectation is understandable. My attempt was to provide a first cut starting point and not to say the output is production quality. I had the following reasons for writing the script and not depend on pandoc or pandoc template building.

  1. This is a one time effort. Once, data is in the MD format, all future edits can be in MD. I am not sure if that assumption is correct.
  2. All comments and WIP latex was to be preserved in the MD as they may be referred for some tracking purposes.
  3. If you see the raw MD and not look at the rendering in GitHub, it's not too hard to fix as desired. And generated MD is fairly easy to fix.
  4. GitHub has a very different way of handling comments than other MD formats available. Hence, they do not render very well.
  5. There are 29 .tex files, so are there 29 .md files. Hence, not sure which files are missing. If you let me know of the files, I can covert and upload. Anyway, the script is available in the build directory.

pandoc output without a template is almost unusable and it eliminates all comments.

My intent is not to generate production quality MD from Latex. That will be significant effort. It may be useful to focus on pandoc in that case. It may be easier to fix the MD files manually from this point. Hence, I have no further intent to spend additional cycles on the script. But if the MD files are useful to be edited to better output, I can send a PR.

@norvig If you think these MD files can be useful let me know.

@Nalinc
Copy link
Collaborator

Nalinc commented Feb 19, 2018

@sambitdash I didn't mean you missed any LaTeX file while running your script. What I meant was there seem to be some files missing in current folder which if present, would give better results(like bibliography and some .sty files like 'customized theapa.sty' which has been mentioned aima3e.sty). Once they are available, you might want to re-run your script or even tweak something.

@sambitdash
Copy link

@Nalinc the missing files will not improve my script performance or quality in anyway. The script just extracts the relevant document structure and not style information. It's always hard to convert styles when both systems are not equivalent in capability. I am not sure but never have seen very complex documents written in Markdown while LaTeX can generate as complex documents as PostScript can generate.

@heisenbuug
Copy link
Contributor

Using jupyter notebook is a great area as .ipynb files are easy to handle over over extensions. I think we can use pandoc and do remaining stuff manually...

@Dragneel7
Copy link
Contributor

I tried using pandoc and jupyter notebook for agents-exercise.
The results were: https://github.com/Dragneel7/aima-exercises-markdown
I had to manually modify the jupyter notebook to resolve some issues not corrected by pandoc, still the jupyter notebook looks muck better than simple agents.md file.
@norvig what do you think?

@norvig
Copy link
Contributor Author

norvig commented Feb 23, 2018

I'm not sure yet. There are many issues.

  • I agree that the jupyter notebooks look better than straight markdown.
  • But I don't think I like the one notebook per chapter approach. I think it would be better to have each question, and each answer, be a separate page, with an index page(s) to link them all together, and probably some kind of #hashtag indexing to find questions on a specific topic.
  • I think of it more as a database of questions/answers, from which we generate pages.
  • Can someone experiment with github pages uses MathJax? How hard would it be to re-generate the github pages? Is that too slow a development cycle?
  • Another possibility is to host questions/answers on stackexchange.
  • Another possibility is kaggle.

@Nalinc
Copy link
Collaborator

Nalinc commented Feb 23, 2018

Can someone experiment with github pages uses MathJax? How hard would it be to re-generate the github pages? Is that too slow a development cycle?

@norvig I generated github pages using a single command jupyter nbconvert --to html *.ipynb and pushed them to a separate branch on my repo(gh-pages). Everything(from conversion to pushing the github pages) took no more than 2 minutes. Resulting HTMLs are MathJAx aware since they have been generated from Jupyter notebooks. Development cycle is definitely not slow (and I believe we can automate this step anyway)

Following are the results:
https://nalinc.github.io/aima-exercises/html/robotics-exercises.html

Here's the example exercises in HTML+MathJax https://nalinc.github.io/aima-exercises/
And here's a sample in Jekyll https://nalinc.github.io/blog/aima/

@Nalinc
Copy link
Collaborator

Nalinc commented Feb 23, 2018

But I don't think I like the one notebook per chapter approach. I think it would be better to have each question, and each answer, be a separate page, with an index page(s) to link them all together, and probably some kind of #hashtag indexing to find questions on a specific topic.

@norvig How about we have all answers in a single file as well? All answers will share the same identifier as the question they belong to. Clicking on a specific question will pop up the answer only for that specific question. OR, we can have a flashcard mode where students will see only one question at a time, but all questions will be fetched together initially.

I loved the idea of hashtagging questions.

@heisenbuug
Copy link
Contributor

@Nalinc github pages are looking good...
I like the flashcard idea, where student will see only one question at a time and by this we can also make sure that we won't show answers to all the questions and give them time to work on it...

@heisenbuug
Copy link
Contributor

@norvig shall i continue to convert .tex files to .ipynb? or shall I try to explore more options as per requirements?

@Nalinc
Copy link
Collaborator

Nalinc commented Feb 23, 2018

@heisenbuug I think after recent merges, we already have Jupyter notebooks for all the chapters. I believe we should look at other options before deciding to settle on one. Another reason why @norvig is hesitant about Jupyter Notebooks is that they are harder to review and edit within Github.

@heisenbuug
Copy link
Contributor

@Nalinc ohk then, I will try to find some other solution...

@heisenbuug
Copy link
Contributor

@Nalinc @norvig we can use WordPress, it also has a MathsJAx plugin so our problem will be solved...and by using WordPress we can create an Index page so we can navigate between questions and we will also get a backend through which we can manage stuff easily...

@nvinayvarma189
Copy link
Contributor

@heisenbuug WordPress can handle these exercises perfectly but the problem is that it will loose the idea of opensource I guess. When someone creates a word press blog, only he can edit the blog. If some other person wants to make changes to the exercises he straight away cannot make a PR which can be viewed by the admin of the blog. The admin has to add him into the list of people who can manage the blog, only then he will be able to make changes(removing these changes if they are not appropriate will be a problem too). I think this will be an unavoidable problem. Please correct me if I'am wrong.

@heisenbuug
Copy link
Contributor

heisenbuug commented Feb 23, 2018

@nvinayvarma189 I agree with you...but considering all our requirements I think Wordpress will be a considerable option coz we need indexing which can be best done there...

@nvinayvarma189
Copy link
Contributor

Yes, but my personal opinion is that this will be comfortable in the initial stage,but as we progress and add the exercises, ranking system and such features, we may not feel WordPress as flexible place to implement these type of things as we have to follow the template given to us(we cannot customise it completely). I think #hashtag indexing feature is used to represent a particular fragment of a webpage and we can implement it too.

@yakout
Copy link
Contributor

yakout commented Feb 23, 2018

Why we don't use Jekyll? I think it is the best solution; It support markdown very well and has a lot of features plus fast development cycle.

I use LaTex and graphs in my site and I can only generate them using MathJax and mermaid. here is an example of some exercise I have put in my Jekyll site (of course we can change the theme as we like).
https://yakout.github.io/general/advanced-search-exercises/ which was generated from markdown files.

@Nalinc
Copy link
Collaborator

Nalinc commented Feb 23, 2018

@heisenbuug I think we should stick with Github pages for hosting aima-exercises. There are multiple reasons.

  • Wordpress is GPLv2, AIMA-exercises is MIT. MIT is a more permissive license than GPL. You cannot include GPL code in an MIT licensed project (unless the project is under a dual license).
  • We want AIMA-exercises to be a community-driven project. Hosting exercises on Wordpress will break that idea.
  • As @yakout mentioned, Jekyll is good and GitHub Pages are already integrated with it (though they can support any static site generator). So we are covered.

@heisenbuug
Copy link
Contributor

@Nalinc agreed. Jekyll will be a great idea @yakout

@norvig
Copy link
Contributor Author

norvig commented Feb 25, 2018

Those Jekyll pages look good, @yakout I didn't see any MathJax, but I understand it will work.

@heisenbuug
Copy link
Contributor

heisenbuug commented Feb 26, 2018

@norvig @Nalinc I made github pages but there are some issues.
Also, I was unable to upload some files as they were giving errors.
Repo Link: https://github.com/heisenbuug/minimal-mistakes-jekyll
Website: https://heisenbuug.github.io/minimal-mistakes-jekyll/
Go through the readme.md file for more information

@Nalinc
Copy link
Collaborator

Nalinc commented Feb 26, 2018

Looks good @heisenbuug . However, I don't see mathematical equations rendering correctly in these pages (refer the image below).

I was unable to upload some files as they were giving errors.

That's probably because Jekyll is throwing a page build failure. It usually happens when tags in markdown are not terminated properly. Can you post the exact error message you're getting?
You can try markdowns from @nvinayvarma189 fork (until they get merged here. see PR #16). I would suggest focus only on one or two markdown files as of now which have a good number of equations and tables. nlp-english-exercises and search-exercises are some good candidates to test.

screenshot from 2018-02-26 13-00-59 copy

@yakout
Copy link
Contributor

yakout commented Feb 26, 2018

@norvig yes, they work perfect! here is nlp-english-exercises.

@heisenbuug to know the errors you need to run jekyll locally > jekyll serve and see the errors in the terminal.
regarding the mathematical equations not rendering correctly as @Nalinc mentioned, please note that MathJax is not included by default so you need to include MathJax in your posts:

<script type="text/javascript" async
  src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-MML-AM_CHTML">
</script>

also note that MathJax syntax is different a little bit from LaTeX Syntax and you need to change any single $ to double $$ just use replace all.

Check this useful link for some working examples.

@heisenbuug
Copy link
Contributor

heisenbuug commented Mar 11, 2018

@yakout where do I have to add that script exactly?

@Nalinc
Copy link
Collaborator

Nalinc commented Mar 11, 2018

@heisenbuug you need to add it everywhere you expect to have a mathematical equation. The safest way to ensure this is to add it in default.html (or whatever primary template you use for layout). Also, note that just adding this script won't render the equations by itself. You need to 'configure' MathJax too. For this, you can either use a configuration file or include configuration commands within the web page. Here's an example of inline configuration.

<script type="text/x-mathjax-config">
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ ['$','$'], ["\\(","\\)"] ],
      processEscapes: true
    }
  });
</script>
<script type="text/javascript" async
  src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-MML-AM_CHTML">
</script>

@norvig
Copy link
Contributor Author

norvig commented Mar 11, 2018

@heisenbuug I like your layout too. I noticed that in https://heisenbuug.github.io/minimal-mistakes-jekyll/game-playing-exercises/ there is some MathJax that didn't get converted -- do you know what's up with that? Are you using the Medium layout? Whatever it is, it looks good.

@Nalinc Nalinc closed this as completed Feb 28, 2019
sachin10101998 pushed a commit to sachin10101998/aima-exercises that referenced this issue Jul 14, 2019
…30-a37d-11e9-9e5a-db49bdb8d58e

New Answer by Sachin Chopra for chapter 1 Exercise 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants
@Nalinc @norvig @yakout @Dragneel7 @sambitdash @nvinayvarma189 @heisenbuug and others