Skip to content

Commit

Permalink
Dipohantine
Browse files Browse the repository at this point in the history
  • Loading branch information
cirosantilli committed Sep 28, 2014
1 parent cef1343 commit e069235
Show file tree
Hide file tree
Showing 17 changed files with 494 additions and 188 deletions.
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -4,5 +4,5 @@ Computer science topics. Mostly practical algorithms and data strictures.

Important sections include:

- [motivation.md](motivation.md): beautiful things about computer science
- [beauty.md](beauty.md): beautiful things about computer science
- [algorithm.md](algorithm.md): starting point for those learning about algorithms
25 changes: 23 additions & 2 deletions algorithm.md
Expand Up @@ -58,7 +58,26 @@ The fist thing to do is to decide on a computer model to work with.

Classical model.

TODO explain.
##### Variants

###### Non-deterministic Turing Machine

###### NTM

Turing machine that has multiple possible transitions per input and state.

It decides between those transitions either:

- optimally through a magic oracle.
- by following all paths at once. TODO: what is the correct output if multiple paths halt?

##### Limitations of Turing machines

While Turing machines accurately describe decidability of existing systems, it does not model performance so well for the following reasons:

- modern computers have random access memory. Fortunately it is simple to model performance by using the so called RAM computation model.

- out-of-core operations: for very large inputs, it is necessary to store data in lower speed access media like hard disks. It then becomes necessary to model how much slower those accesses are.

#### RAM model

Expand All @@ -74,6 +93,8 @@ Sometimes algorithms must operate on data that is too large to fit in RAM, e.g.

Certain algorithms are developed with that restriction in mind, e.g., the B-tree, which is less efficient than other binary search trees for in RAM computing, but much more efficient of out-of-core problems.

There is no simple way of modeling the performance of out-of-core algorithms: we just have to give different weights to certain operations, and then solve complex numerical optimization decisions.

#### Input length vs value

Keep in mind that big O analysis uses a Turing machine, so what matters is the *length* of the input, *not* its value.
Expand Down Expand Up @@ -225,7 +246,7 @@ If such algorithm is possible, the advantage is obvious: it uses less memory for

### Free sources

lecture notes:
Lecture notes:

- <http://webdocs.cs.ualberta.ca/~holte/t26/top.realtop.html>
- <https://secweb.cs.odu.edu/~zeil/cs361/web/website/directory/page/topics.html>
Expand Down
29 changes: 28 additions & 1 deletion motivation.md → beauty.md
@@ -1,4 +1,4 @@
# Motivation
# Beauty

Links and short descriptions of beautiful problems in computer science.

Expand Down Expand Up @@ -307,3 +307,30 @@ They are fun and important to implement solutions using computers.
- Differential equations: ordinary/partial.

- Finite elements.

### Number theory

#### Diophantine equations

##### Hilbert's tenth problem

<https://en.wikipedia.org/wiki/Hilbert%27s_tenth_problem>

Given an integer Diophantine equation $P(x, y, z, ...)$, where $P$ is a multivariate polynomial, is there an integer solution?

Famously proposed as an important problem in 1900, last step of the undecidability proof in 1970.

Interesting subset problems include:

- Fermat's last algorithm decides negatively a small subset of Diophantine equations of the form $x^n + y^n = z^n$.

- limiting maximum degree:

- 1: efficient solution
- 2: there is an algorithm: <http://math.stackexchange.com/questions/181380/second-degree-diophantine-equations/181384#comment418090_181384, but not efficient.
- 3: unsolved
- 4: equivalent to the general problem of degree $n$, so undecidable

##### Reduction of generating equations to 9 variables

If a set is defined by a system of Diophantine equations, it can also be defined by a system of Diophantine equations in only 9 variables (Matiyasevich 1999).
52 changes: 33 additions & 19 deletions context-free.md
@@ -1,5 +1,7 @@
# Context-free grammar

Related automaton: PDA.

## Application

Sufficient for most programming languages, while regexes are not.
Expand All @@ -10,25 +12,40 @@ Usually, programming languages are faster to parse subsets of CFG
most notably deterministic context free grammars,
which parse in $O(n)$) instead of $O(n^3)$.

## Pushdown automata
## Complexity

Non deterministic.
CYK is the most widely used algorithm and recognizes it in $O(n^3)$.
It is practically good, but better asymptotic already known.

## Recognition complexity
Parsing CGFs and multiplying 0/1 matrix algorithms are almost time Valiant (1975) equivalent:

CYK algorithm: $O(n^3)$, practically good, but better asymptotic already known.
- Valiant (1975) has a method that given a multiplication algorithm,
it can be converted into a parsing algorithm of the same complexity

Parsing CGFs and multiplying 0/1 matrix algorithms are almost time equivalent.
- somewhat conversely, Lee (2002) proved that any parsing algorithm in $O(n^{3-c})$
can be converted into a matrix multiplication algorithm of $O(n^{3-c/3})$

Therefore the optimal time is linked to matrix multiplication,
which is still an open problem, but conjectured to have largest
lower bound 2, even if the best algorithms known are at around $O(n^2.37)$
with huge constant terms
with huge constant terms.

In practice however, CYK is still the most used algorithm as of 2014.

## Normal form

TODO

## Ambiguity

## Inherently ambiguous languages

Although some CFLs have both an ambiguous and non ambiguous representation,
there are others which only have ambiguous representations.
Such languages are called inherently ambiguous languages.

Their existence was first proved by <https://en.wikipedia.org/wiki/Parikh%27s_theorem> (1961).

## Undecidable problems

There are lots of interesting ones:
Expand All @@ -38,39 +55,36 @@ There are lots of interesting ones:
Given a CFG, does it generate the language of all strings over the alphabet
of terminal symbols used in its rules.

### Language equality
Equivalence with one side fixed.

### Equivalence

Given two CFG, do they accept the same language?

Decidable O(n) for regular expressions!
Decidable $O(n)$ for regular expressions, and decidable for DPDA.

### Language inclusion

Given two CFG, is one language included in the other?

### Chomsky hierarchy
### Inclusions on Chomsky hierarchy

Given a CSG, is it a CFG?

Given a CFG, is it a regex?

### Ambiguity
### Ambiguity detection

Given a CDG, is it ambiguous?

## Ambiguity

Certain languages can only be recognized by ambiguous grammars.

## Extended context-free grammar

Grammar in which each right hand side can be a regex.

Same languages as context-free grammars.

Exactly what the lex/yacc pair does.
Same languages as context-free grammars, since regular expressions are contained in context-free grammars.

It does that for one reason: separating complexities.
Convenient because it represents well what most parsers do today: first a regex tokenization step, then parse.

## Deterministic context-free grammar

Same as non deterministic, but with deterministic automaton.
Same as non-deterministic, but with deterministic automaton.
39 changes: 35 additions & 4 deletions crypto.md
Expand Up @@ -39,7 +39,38 @@ e.g. Git SHA to identify objects uniquely.

Desired properties:

- it is easy to compute the hash value for any given message
- it is infeasible to generate a message that has a given hash
- it is infeasible to modify a message without changing the hash
- it is infeasible to find two different messages with the same hash.
- it is easy to compute the hash value for any given message

- it is infeasible to generate a message that has a given hash

- it is infeasible to modify a message without changing the hash

- it is infeasible to find two different messages with the same hash.

This is in general much easier than finding an input with a given hash because of
the birthday problem: <http://en.wikipedia.org/wiki/Birthday_problem>

### Implementations

#### SHA-1

160 bits.

SHA-1 is the most popular in 2014. Used in Git.

Attacks were found in 2005, but they are were too expensive.

Some parts of the US government moved to SHA-2 in 2010 because of the weaknesses.

SHA-1 will be practical in 2018 for organized crime:
<https://www.schneier.com/blog/archives/2012/10/when_will_we_se.html>

Google, Microsoft and Mozilla will remove SHA-1
for security in 2017 and use SHA-2 instead.

SHA-1 prefix fixing already practical in 2014 on personal computers:
<https://github.com/bradfitz/gitbrute>

#### SHA-2

Family of 6 functions and output lengths.
14 changes: 13 additions & 1 deletion dfa.md
Expand Up @@ -4,10 +4,22 @@ Discrete finite automata.

<http://en.wikipedia.org/wiki/DFA_minimization>

Recognize the same languages as regular expressions.
Recognize the same languages as regular grammars.

## Minimization

It is possible algorithmically minimize a DFA to an equivalent one with the smallest possible number of states.

The minimum is unique up to renaming, so it is also a good canonical form.

Hopcroft (1971) in $O(n log n)$ .

## Non-deterministic

Equivalent power to deterministic, proved with definition in 1959.

Interestingly, the same is not the case for push down automata,
in which deterministic are less powerful than non-deterministic.

Turing machines are also equivalent to NTMs,
but the change alters the complexity of computations.
31 changes: 31 additions & 0 deletions formal-language.md
@@ -0,0 +1,31 @@
# Formal languages

A language is a set of strings.

A grammar for a language is a set of rules that produces exactly that language.
It is not easy, given a grammar and a string, to find out how the grammar can generate the string,
because at each step there are many possible actions.

## Chomsky hierarchy

<https://en.wikipedia.org/wiki/Chomsky_hierarchy>

Famous hierarchy of certain languages that are strictly contained in each other.

The original hierarchy contains only:

| Grammar | Automaton | Production rules |
|------------------------|-------------------------------------------------|--------------------------------------------------|
| Recursively enumerable | Turing machine | |
| Context-sensitive | Linear-bounded non-deterministic Turing machine | $\alpha A \beta \rightarrow \alpha \gamma \beta$ |
| Context-free | Non-deterministic pushdown automaton | $A \ rightarrow \gamma$ |
| Regular | Finite state automaton | $A \rightarrow a$ and $A \rightarrow aB$ |

There are however many other well known languages in between those classes.

- finite language: contains only a finite number of words. Strict subset of Regular.
- [LL](https://en.wikipedia.org/wiki/LL_grammar), and the related LR, SLL, SLR. Useful subset of context-free. Related automaton: DPDA.

## Category of popular languages

- C and C++ are ambiguous and cannot be parsed by $LR(1)$, but can be parsed by GLR: <http://stackoverflow.com/questions/243383/why-cant-c-be-parsed-with-a-lr1-parser>
55 changes: 53 additions & 2 deletions licenses.md
Expand Up @@ -15,10 +15,33 @@ so you can basically do anything you want, except remove the license from sub pr

## GPL

Like MIT, but you *cannot* use it in non GPL PROJECTS (COPYLEFT),
Like MIT, but you *cannot* use it in non GPL PROJECTS (copyleft),
meaning that you cannot make a closed source project that uses it.

This makes the project useless for closed source projects to build upon.
This makes the project useless for closed source projects to build upon,
and forces users to merge back improvements.

While a beautiful concept, which has arguably worked for the Linux kernel,
it is a similar principle to communism, and we all know how that went.

The following website is of a leading group of lawyers that enforce GPL:
<http://gpl-violations.org/about.html>

Notable cases include:

- Iliad: major Telecom player in France, sued in 2008,
released it's source code modification in 2011 at
<http://floss.freebox.fr> to avoid further pursuits.

### Linking and the GPL

It is not very clear in the GPL text is dynamic and static linking are allowed or not
from proprietary libraries.

It is therefore better to play it safe and assume that it is not possible.

[libgit2](https://github.com/libgit2/libgit2) is a notable example of GPLv2 with a linking exception,
explicitly allowing linking, since GitHub is behind the library.

## LGPL

Expand All @@ -28,3 +51,31 @@ Lesser GPL: like MIT, but you must also distribute modifications you make to the

Some licenses are called dual X/Y (ex: dual BSD/GPL), meaning that you can
take either one of them.

## CC

Family of licenses.

<https://creativecommons.org/licenses/>

Symbols:

- `BY`: attribution to creator
- `SA`: copyleft
- `ND`: no derive: can't modify
- `NC`: cannot use in commercial products

Many combinations are possible.

This is indicated in their logo.

There is also a `CC0` license which puts things in the public domain.

## Public domain

You have no rights over the work.

Once released on the public domain, you relinquish any rights you have over the work:
in particular you cannot change it to another license later on.

Anyone can do anything with the work without even mentioning you, except copyright it.

0 comments on commit e069235

Please sign in to comment.