Dipohantine

cirosantilli · Sep 28, 2014 · e069235 · e069235
1 parent cef1343
commit e069235
Show file tree

Hide file tree

Showing 17 changed files with 494 additions and 188 deletions.
diff --git a/README.md b/README.md
@@ -4,5 +4,5 @@ Computer science topics. Mostly practical algorithms and data strictures.
 
 Important sections include:
 
-- [motivation.md](motivation.md): beautiful things about computer science
+- [beauty.md](beauty.md): beautiful things about computer science
 - [algorithm.md](algorithm.md): starting point for those learning about algorithms
diff --git a/algorithm.md b/algorithm.md
@@ -58,7 +58,26 @@ The fist thing to do is to decide on a computer model to work with.
 
 Classical model.
 
-TODO explain.
+##### Variants
+
+###### Non-deterministic Turing Machine
+
+###### NTM
+
+Turing machine that has multiple possible transitions per input and state.
+
+It decides between those transitions either:
+
+- optimally through a magic oracle.
+- by following all paths at once. TODO: what is the correct output if multiple paths halt?
+
+##### Limitations of Turing machines
+
+While Turing machines accurately describe decidability of existing systems, it does not model performance so well for the following reasons:
+
+- modern computers have random access memory. Fortunately it is simple to model performance by using the so called RAM computation model.
+
+- out-of-core operations: for very large inputs, it is necessary to store data in lower speed access media like hard disks. It then becomes necessary to model how much slower those accesses are.
 
 #### RAM model
 
@@ -74,6 +93,8 @@ Sometimes algorithms must operate on data that is too large to fit in RAM, e.g.
 
 Certain algorithms are developed with that restriction in mind, e.g., the B-tree, which is less efficient than other binary search trees for in RAM computing, but much more efficient of out-of-core problems.
 
+There is no simple way of modeling the performance of out-of-core algorithms: we just have to give different weights to certain operations, and then solve complex numerical optimization decisions.
+
 #### Input length vs value
 
 Keep in mind that big O analysis uses a Turing machine, so what matters is the *length* of the input, *not* its value.
@@ -225,7 +246,7 @@ If such algorithm is possible, the advantage is obvious: it uses less memory for
 
 ### Free sources
 
-lecture notes:
+Lecture notes:
 
 - <http://webdocs.cs.ualberta.ca/~holte/t26/top.realtop.html>
 - <https://secweb.cs.odu.edu/~zeil/cs361/web/website/directory/page/topics.html>

diff --git a/motivation.md → beauty.md b/motivation.md → beauty.md
@@ -1,4 +1,4 @@
-# Motivation
+# Beauty
 
 Links and short descriptions of beautiful problems in computer science.
 
@@ -307,3 +307,30 @@ They are fun and important to implement solutions using computers.
 -   Differential equations: ordinary/partial.
 
 -   Finite elements.
+
+### Number theory
+
+#### Diophantine equations
+
+##### Hilbert's tenth problem
+
+<https://en.wikipedia.org/wiki/Hilbert%27s_tenth_problem>
+
+Given an integer Diophantine equation $P(x, y, z, ...)$, where $P$ is a multivariate polynomial, is there an integer solution?
+
+Famously proposed as an important problem in 1900, last step of the undecidability proof in 1970.
+
+Interesting subset problems include:
+
+-   Fermat's last algorithm decides negatively a small subset of Diophantine equations of the form $x^n + y^n = z^n$.
+
+-   limiting maximum degree:
+
+    - 1: efficient solution
+    - 2: there is an algorithm: <http://math.stackexchange.com/questions/181380/second-degree-diophantine-equations/181384#comment418090_181384, but not efficient.
+    - 3: unsolved
+    - 4: equivalent to the general problem of degree $n$, so undecidable
+
+##### Reduction of generating equations to 9 variables
+
+If a set is defined by a system of Diophantine equations, it can also be defined by a system of Diophantine equations in only 9 variables (Matiyasevich 1999).
diff --git a/context-free.md b/context-free.md
@@ -1,5 +1,7 @@
 # Context-free grammar
 
+Related automaton: PDA.
+
 ## Application
 
 Sufficient for most programming languages, while regexes are not.
@@ -10,25 +12,40 @@ Usually, programming languages are faster to parse subsets of CFG
 most notably deterministic context free grammars,
 which parse in $O(n)$) instead of $O(n^3)$.
 
-## Pushdown automata
+## Complexity
 
-Non deterministic.
+CYK is the most widely used algorithm and recognizes it in $O(n^3)$.
+It is practically good, but better asymptotic already known.
 
-## Recognition complexity
+Parsing CGFs and multiplying 0/1 matrix algorithms are almost time Valiant (1975) equivalent:
 
-CYK algorithm: $O(n^3)$, practically good, but better asymptotic already known.
+-   Valiant (1975) has a method that given a multiplication algorithm,
+    it can be converted into a parsing algorithm of the same complexity
 
-Parsing CGFs and multiplying 0/1 matrix algorithms are almost time equivalent.
+-   somewhat conversely, Lee (2002) proved that any parsing algorithm in $O(n^{3-c})$
+    can be converted into a matrix multiplication algorithm of $O(n^{3-c/3})$
 
 Therefore the optimal time is linked to matrix multiplication,
 which is still an open problem, but conjectured to have largest
 lower bound 2, even if the best algorithms known are at around $O(n^2.37)$
-with huge constant terms
+with huge constant terms.
+
+In practice however, CYK is still the most used algorithm as of 2014.
 
 ## Normal form
 
 TODO
 
+## Ambiguity
+
+## Inherently ambiguous languages
+
+Although some CFLs have both an ambiguous and non ambiguous representation,
+there are others which only have ambiguous representations.
+Such languages are called inherently ambiguous languages.
+
+Their existence was first proved by <https://en.wikipedia.org/wiki/Parikh%27s_theorem> (1961).
+
 ## Undecidable problems
 
 There are lots of interesting ones:
@@ -38,39 +55,36 @@ There are lots of interesting ones:
 Given a CFG, does it generate the language of all strings over the alphabet
 of terminal symbols used in its rules.
 
-### Language equality
+Equivalence with one side fixed.
+
+### Equivalence
 
 Given two CFG, do they accept the same language?
 
-Decidable O(n) for regular expressions!
+Decidable $O(n)$ for regular expressions, and decidable for DPDA.
 
 ### Language inclusion
 
 Given two CFG, is one language included in the other?
 
-### Chomsky hierarchy
+### Inclusions on Chomsky hierarchy
 
 Given a CSG, is it a CFG?
+
 Given a CFG, is it a regex?
 
-### Ambiguity
+### Ambiguity detection
 
 Given a CDG, is it ambiguous?
 
-## Ambiguity
-
-Certain languages can only be recognized by ambiguous grammars.
-
 ## Extended context-free grammar
 
 Grammar in which each right hand side can be a regex.
 
-Same languages as context-free grammars.
-
-Exactly what the lex/yacc pair does.
+Same languages as context-free grammars, since regular expressions are contained in context-free grammars.
 
-It does that for one reason: separating complexities.
+Convenient because it represents well what most parsers do today: first a regex tokenization step, then parse.
 
 ## Deterministic context-free grammar
 
-Same as non deterministic, but with deterministic automaton.
+Same as non-deterministic, but with deterministic automaton.
diff --git a/crypto.md b/crypto.md
@@ -39,7 +39,38 @@ e.g. Git SHA to identify objects uniquely.
 
 Desired properties:
 
-- it is easy to compute the hash value for any given message
-- it is infeasible to generate a message that has a given hash
-- it is infeasible to modify a message without changing the hash
-- it is infeasible to find two different messages with the same hash.
+-   it is easy to compute the hash value for any given message
+
+-   it is infeasible to generate a message that has a given hash
+
+-   it is infeasible to modify a message without changing the hash
+
+-   it is infeasible to find two different messages with the same hash.
+
+    This is in general much easier than finding an input with a given hash because of
+    the birthday problem: <http://en.wikipedia.org/wiki/Birthday_problem>
+
+### Implementations
+
+#### SHA-1
+
+160 bits.
+
+SHA-1 is the most popular in 2014. Used in Git.
+
+Attacks were found in 2005, but they are were too expensive.
+
+Some parts of the US government moved to SHA-2 in 2010 because of the weaknesses.
+
+SHA-1 will be practical in 2018 for organized crime:
+<https://www.schneier.com/blog/archives/2012/10/when_will_we_se.html>
+
+Google, Microsoft and Mozilla will remove SHA-1
+for security in 2017 and use SHA-2 instead.
+
+SHA-1 prefix fixing already practical in 2014 on personal computers:
+<https://github.com/bradfitz/gitbrute>
+
+#### SHA-2
+
+Family of 6 functions and output lengths.
diff --git a/dfa.md b/dfa.md
@@ -4,10 +4,22 @@ Discrete finite automata.
 
 <http://en.wikipedia.org/wiki/DFA_minimization>
 
-Recognize the same languages as regular expressions.
+Recognize the same languages as regular grammars.
 
 ## Minimization
 
 It is possible algorithmically minimize a DFA to an equivalent one with the smallest possible number of states.
 
 The minimum is unique up to renaming, so it is also a good canonical form.
+
+Hopcroft (1971) in $O(n log n)$ .
+
+## Non-deterministic
+
+Equivalent power to deterministic, proved with definition in 1959.
+
+Interestingly, the same is not the case for push down automata,
+in which deterministic are less powerful than non-deterministic.
+
+Turing machines are also equivalent to NTMs,
+but the change alters the complexity of computations.
diff --git a/formal-language.md b/formal-language.md
@@ -0,0 +1,31 @@
+# Formal languages
+
+A language is a set of strings.
+
+A grammar for a language is a set of rules that produces exactly that language.
+It is not easy, given a grammar and a string, to find out how the grammar can generate the string,
+because at each step there are many possible actions.
+
+## Chomsky hierarchy
+
+<https://en.wikipedia.org/wiki/Chomsky_hierarchy>
+
+Famous hierarchy of certain languages that are strictly contained in each other.
+
+The original hierarchy contains only:
+
+| Grammar                | Automaton                                       | Production rules                                 |
+|------------------------|-------------------------------------------------|--------------------------------------------------|
+| Recursively enumerable | Turing machine                                  |                                                  |
+| Context-sensitive      | Linear-bounded non-deterministic Turing machine | $\alpha A \beta \rightarrow \alpha \gamma \beta$ |
+| Context-free           | Non-deterministic pushdown automaton            | $A \ rightarrow \gamma$                          |
+| Regular                | Finite state automaton                          | $A \rightarrow a$ and $A \rightarrow aB$         |
+
+There are however many other well known languages in between those classes.
+
+- finite language: contains only a finite number of words. Strict subset of Regular.
+- [LL](https://en.wikipedia.org/wiki/LL_grammar), and the related LR, SLL, SLR. Useful subset of context-free. Related automaton: DPDA.
+
+## Category of popular languages
+
+- C and C++ are ambiguous and cannot be parsed by $LR(1)$, but can be parsed by GLR: <http://stackoverflow.com/questions/243383/why-cant-c-be-parsed-with-a-lr1-parser>
diff --git a/licenses.md b/licenses.md
@@ -15,10 +15,33 @@ so you can basically do anything you want, except remove the license from sub pr
 
 ## GPL
 
-Like MIT, but you *cannot* use it in non GPL PROJECTS (COPYLEFT),
+Like MIT, but you *cannot* use it in non GPL PROJECTS (copyleft),
 meaning that you cannot make a closed source project that uses it.
 
-This makes the project useless for closed source projects to build upon.
+This makes the project useless for closed source projects to build upon,
+and forces users to merge back improvements.
+
+While a beautiful concept, which has arguably worked for the Linux kernel,
+it is a similar principle to communism, and we all know how that went.
+
+The following website is of a leading group of lawyers that enforce GPL:
+<http://gpl-violations.org/about.html>
+
+Notable cases include:
+
+-   Iliad: major Telecom player in France, sued in 2008,
+    released it's source code modification in 2011 at
+    <http://floss.freebox.fr> to avoid further pursuits.
+
+### Linking and the GPL
+
+It is not very clear in the GPL text is dynamic and static linking are allowed or not
+from proprietary libraries.
+
+It is therefore better to play it safe and assume that it is not possible.
+
+[libgit2](https://github.com/libgit2/libgit2) is a notable example of GPLv2 with a linking exception,
+explicitly allowing linking, since GitHub is behind the library.
 
 ## LGPL
 
@@ -28,3 +51,31 @@ Lesser GPL: like MIT, but you must also distribute modifications you make to the
 
 Some licenses are called dual X/Y (ex: dual BSD/GPL), meaning that you can
 take either one of them.
+
+## CC
+
+Family of licenses.
+
+<https://creativecommons.org/licenses/>
+
+Symbols:
+
+- `BY`: attribution to creator
+- `SA`: copyleft
+- `ND`: no derive: can't modify
+- `NC`: cannot use in commercial products
+
+Many combinations are possible.
+
+This is indicated in their logo.
+
+There is also a `CC0` license which puts things in the public domain.
+
+## Public domain
+
+You have no rights over the work.
+
+Once released on the public domain, you relinquish any rights you have over the work:
+in particular you cannot change it to another license later on.
+
+Anyone can do anything with the work without even mentioning you, except copyright it.