# Cryptography
Corresponding, profound, Slides (From Cryptography I) at: Blockchain/CryptographyI/LectureSlides

## Introduction to Cryptography

**What is cryptography?**
> **Cryptography** is the practice and study of techniques for secure communication in the presence of third parties called adversaries. More generally, cryptography is about constructing and analyzing protocols that prevent third parties or the public from reading private messages; various aspects in information security such as **data confidentiality**, **data integrity**, **authentication**, and **non-repudiation** are central to modern cryptography. Modern cryptography exists at the intersection of the disciplines of mathematics, computer science, electrical engineering, communication science, and physics. Applications of cryptography include electronic commerce, chip-based payment cards, digital currencies, computer passwords, and military communications.

Cryptography prior to the modern age was effectively synonymous with encryption, the conversion of information from a readable state to apparent nonsense. The originator of an encrypted message shared the decoding technique needed to recover the original information only with intended recipients, thereby precluding unwanted persons from doing the same. The cryptography literature often uses the name Alice ("A") for the sender, Bob ("B") for the intended recipient, and Eve ("eavesdropper") for the adversary. 

**History of Cryptography:**
- Refer to the History segment inside the following PDF: Blockchain/CryptographyI/LectureSlides/Week1/Introduction.pdf
- [History of Cryptography](https://www.tutorialspoint.com/cryptography/origin_of_cryptography.htm)

**Modern Cryptography:**
>Modern cryptography is heavily based on mathematical theory and computer science practice; **cryptographic algorithms are designed around computational hardness assumptions**, making such algorithms hard to break in practice by any adversary. It is theoretically possible to break such a system, but it is infeasible to do so by any known practical means. These schemes are therefore termed computationally secure; theoretical advances, e.g., improvements in integer factorization algorithms, and faster computing technology require these solutions to be continually adapted. There exist **information-theoretically secure** schemes that probably cannot be broken even with unlimited computing power—an example is the one-time pad—but these schemes are more difficult to implement than the best theoretically breakable but computationally secure mechanisms.

More on: [Modern Cryptography](https://www.tutorialspoint.com/cryptography/modern_cryptography.htm)

**Cryptology:**<br>
The art of devising ciphers (i.e cryptography) and breaking them i.e., cryptanalysis) is collectively known as cryptology.

**Cryptanalysis:**<br>
Cryptanalysis is the study of analyzing information systems in order to study the hidden aspects of the systems. Cryptanalysis is used to breach cryptographic security systems and gain access to the contents of encrypted messages, even if the cryptographic key is unknown.

In addition to mathematical analysis of cryptographic algorithms, cryptanalysis includes the study of side-channel attacks that do not target weaknesses in the cryptographic algorithms themselves, but instead exploit weaknesses in their implementation.

>Cryptanalysis is the sister branch of Cryptography and they both co-exist. The cryptographic process results in the cipher text for transmission or storage. It involves the study of cryptographic mechanism with the intention to break them. Cryptanalysis is also used during the design of the new cryptographic techniques to test their security strengths.

### Security Services of Cryptography

The primary objective of using cryptography is to provide the following **four fundamental information security services:**

**Confidentiality**

Confidentiality is the fundamental security service provided by cryptography. It is a security service that keeps the information from an unauthorized person. It is sometimes referred to as privacy or secrecy. 

Confidentiality can be achieved through numerous means starting from physically securing to the use of mathematical algorithms for data encryption.

**Data Integrity**

It is security service that deals with identifying any alteration to the data. The data may get modified by an unauthorized entity intentionally or accidentally. Integrity service confirms that whether data is intact or not since it was last created, transmitted, or stored by an authorized user.

Data integrity cannot prevent the alteration of data, but provides a means for detecting whether data has been manipulated in an unauthorized manner.

**Authentication**

Authentication provides the identification of the originator. It confirms to the receiver that the data received has been sent only by an identified and verified sender.

>Authentication service has two variants:
- Message authentication identifies the originator of the message without any regard router or system that has sent the message.
- Entity authentication is assurance that data has been received from a specific entity, say a particular website.

Apart from the originator, authentication may also provide assurance about other parameters related to data such as the date and time of creation/transmission.

**Non-repudiation**

It is a security service that ensures that an entity cannot refuse the ownership of a previous commitment or an action. It is an assurance that the original creator of the data cannot deny the creation or transmission of the said data to a recipient or third party.

Non-repudiation is a property that is most desirable in situations where there are chances of a dispute over the exchange of data. For example, once an order is placed electronically, a purchaser cannot deny the purchase order, if non-repudiation service was enabled in this transaction.

### Cryptography Primitives

Cryptographic primitives are well-established, low-level cryptographic algorithms that are frequently used to build cryptographic protocols for computer security systems. Alternatively, cryptography primitives can be defined as the tools and techniques in Cryptography that can be selectively used to provide a set of desired security services. 

Following are some cryptography primitives:
- Encryption
- Hash functions
- Message Authentication codes (MAC)
- Digital Signatures

When creating cryptographic systems, designers use cryptographic primitives as their most basic building blocks. Because of this, cryptographic primitives are designed to do one very specific task in a highly reliable fashion.


The following table shows the primitives that can achieve a particular security service on their own.

![](./Images/CryptoPrimitives.png "Crypto Primitives and thier corresponding security service")

Note: Cryptographic primitives are intricately related and they are often combined to achieve a set of desired security services from a cryptosystem.

**The three steps in cryptography:**
>When we introduce/devise a new primitive these 3 steps have to be rigorously followed:
1. **Precisely specify threat model:** Threat model basically is knowing the capabilities of the adversaries, i.e., what can an adversarial do to attack the primitive and what is his goal in forging the primitive. In order to show that the primitive or cryptographic protocol is secure we need to prove that an adversary with the following capabilities would not be able to break the primitive/protocol. More on [Threat Model](https://www.youtube.com/watch?v=f4tk2pnOUos)
2. **Propose a construction**
3. **Prove that breaking construction under threat model will solve an underlying hard problem**: A basic example would be, it is easy to multiply to large prime to get a value N, but it's hard to recover the factors given the value N. So, if our prime works on that concept than if an adversary breaks our primitive/protocol than it would land a solution to solving that hard problem.

**Key Note:** For production system usage, never ever use your own implementation of the primitive or any cryptographic algorithm (as aside from the implementation errors there could be many side channels which could potentially result in easy breaching of your implementation). It is always recommended to use a trusted library for applying ciphering to production level data/information. [Explanatory Video](https://www.youtube.com/watch?v=3Re5xlEjC8w)

## Crash Course on Discrete Probability

Why Discrete Probability?
> Over the years many natural cryptographic constructions were found to be insecure. In response, modern cryptography was developed as a rigorous science where constructions are always accompanied by a proof of security. The language used to describe security relies on discreet probability.<br>

Reference Reads: 
- **Highly recommended (Easy to Digest and Preferable read to get it all):** [Discrete Probability](https://en.wikibooks.org/wiki/High_School_Mathematics_Extensions/Discrete_Probability)
-  Refer to the Discrete Probability Crash Course segment inside the following PDF: Blockchain/CryptographyI/LectureNotes/Week1/Introduction.pdf
- [Discrete vs Continuous Random Variables](http://www.henry.k12.ga.us/ugh/apstat/chapternotes/7supplement.html)
- [Random Variable vs Events](https://www.quora.com/What-is-the-difference-between-an-event-and-a-random-variable)

Reference Videos: 
- [Discrete Probability Crash Course [Part 1]](https://www.coursera.org/learn/crypto/lecture/qaEcL/discrete-probability-crash-course)
- [Discrete Probability Crash Course [Part 2]](https://www.coursera.org/learn/crypto/lecture/JkDRg/discrete-probability-crash-course-cont)
- [Probability Distribution for Random Variable X](https://www.youtube.com/watch?v=cqK3uRoPtk0)

**Deterministic vs Randomized Algorithms:**
> It's due to Discrete Probability that cryptographic algorithms took a leap from being deterministic, producing same output for a given input each time, in nature to being randomized algorithms that we use today. <br><br>
>**Randomized Algorithms:** are those which produce different outputs given the same input, i.e., even though the input to the randomized algorithm is the same, it will produce different output each time, as Random Algorithm have an implicit argument, say r, which is sampled anew, from it's give universe, every time the algorithm is run therefore making the outcome different.<br><br>
The output of this Random Algorithm is basically a random variable which is a distribution over the set of all possible encryption of message m under a  uniform key r.

More on Randomized Algorithm: Refer to the Randomized Algorithms topic undert Discrete Probability segment inside the following PDF: Blockchain/CryptographyI/LectureNotes/Week1/Introduction.pdf

**XOR:**
XOR is very important when it comes to cryptography. Review: XOR of two bit string is their bitwise addition mod 2. Also, something XORed with itself yields zeros => x XOR x = 0.
[Why XOR is imp in cryptography?](https://www.quora.com/Why-is-XOR-important-in-cryptography).

![](./Images/XOR-Property.png "The Important Property of XOR")

Note: Review the following video [Discrete Probability Crash Course [Part 2]](https://www.coursera.org/learn/crypto/lecture/JkDRg/discrete-probability-crash-course-cont), watch it from 6:19 where description of the important property of XOR is explained which makes it so useful in cryptography.


## Cryptosystems

A cryptosystem is an implementation of cryptographic techniques and their accompanying infrastructure to provide information security services. A cryptosystem is also referred to as a cipher system.

### Components of a Cryptosystem

The various components of a basic cryptosystem are as follows:

- **Plaintext:** It is the data to be protected during transmission.

- **Encryption Algorithm:** It is a mathematical process that produces a ciphertext for any given plaintext and encryption key. It is a cryptographic algorithm that takes plaintext and an encryption key as input and produces a ciphertext.

- **Ciphertext:** It is the scrambled version of the plaintext produced by the encryption algorithm using a specific the encryption key. The ciphertext is not guarded. It flows on public channel. It can be intercepted or compromised by anyone who has access to the communication channel.

- **Decryption Algorithm:** It is a mathematical process, that produces a unique plaintext for any given ciphertext and decryption key. It is a cryptographic algorithm that takes a ciphertext and a decryption key as input, and outputs a plaintext. The decryption algorithm essentially reverses the encryption algorithm and is thus closely related to it.

- **Encryption Key:** It is a value that is known to the sender. The sender inputs the encryption key into the encryption algorithm along with the plaintext in order to compute the ciphertext.

- **Decryption Key:** It is a value that is known to the receiver. The decryption key is related to the encryption key, but is not always identical to it. The receiver inputs the decryption key into the decryption algorithm along with the ciphertext in order to compute the plaintext.

For a given cryptosystem, a collection of all possible decryption keys is called a **key space**.

An interceptor (an attacker) is an unauthorized entity who attempts to determine the plaintext. He can see the ciphertext and may know the decryption algorithm. He, however, must never know the decryption key.

### Types of Cryptosystems
Fundamentally, there are two types of cryptosystems based on the manner in which encryption-decryption is carried out in the system:

- Symmetric Key Encryption
- Asymmetric Key Encryption

The main difference between these cryptosystems is the relationship between the encryption and the decryption key. Logically, in any cryptosystem, both the keys are closely associated. It is practically impossible to decrypt the ciphertext with the key that is unrelated to the encryption key.

### Symmetric Key Encryption
The encryption process where **same keys are used for encrypting and decrypting** the information is known as Symmetric Key Encryption.
The study of symmetric cryptosystems is referred to as symmetric cryptography. Symmetric cryptosystems are also sometimes referred to as **secret key cryptosystems**.

**Crypto Lesson:** In a symmetric cryptosystem, you should never use a single shared key to encrypt data/information in both direction i.e. traffic from Client(C) to Sever(S) should not be encrypted with the same key as used for encrypting the traffic from Server to Client. Hence, the shared key should be a pair of keys => $K_{shared}$ = {$K_{C>>S}$, $K_{S>>C}$} where prior is used to encrypt/decrypt the information from client to server and the latter from server to client.

**Symmetric Ciphers:**
![](./Images/SymmetricCipherDef.png )

### One Time Pad and Information Theoretic Security: 
Short Intuitive Description: [One Time Pad](https://www.khanacademy.org/computing/computer-science/cryptography/crypt/v/one-time-pad "by Khan Academy")<br>
Recommended Watch: [One Time Pad and Information Theoretic Security](https://www.coursera.org/learn/crypto/lecture/cbnX1/information-theoretic-security-and-the-one-time-pad "by Coursera: Cryptography I") <br>
Go through the One Time Pad section in: Blockchain/CryptographyI/LectureSlides/Week1/StreamCiphers.pdf

One-time pad (OTP), also called Vernam-cipher or the perfect cipher, is a crypto algorithm where plaintext is combined with a random key (where the random key is a uniform random variable from a key space K, i.e., selection of any key from K has equal uniform/equal probability). It is the only existing mathematically unbreakable encryption and is a symmetric cipher.

>The one-time pad (OTP) is an encryption technique that cannot be cracked, but requires the use of a one-time pre-shared key the same size as, or longer than, the message being sent. In this technique, a plaintext is paired with a random secret key (also referred to as a one-time pad). Then, each bit or character of the plaintext is encrypted by combining it with the corresponding bit or character from the pad using modular addition. If the key is truly random, is at least as long as the plaintext, is never reused in whole or in part, and is kept completely secret, then the resulting ciphertext will be impossible to decrypt or break.

>Even infinite computational power and infinite time cannot break one-time pad encryption, simply because it is mathematically impossible. However, if only one of these rules is disregarded, the cipher is no longer unbreakable.
- The key is at least as long as the message or data that must be encrypted.
- The key is truly random (not generated by a simple computer function or such)
- Key and plaintext are calculated modulo 10 (digits), modulo 26 (letters) or modulo 2 (binary)
- Each key is used only once (that's why we call it **One Time Pad**), and both sender and receiver must destroy their key after use. If the same key is used twice, the security will be compromised.
- There should only be two copies of the key: one for the sender and one for the receiver (some exceptions exist for multiple receivers)

**Perfect Secrecy:** Perfect secrecy is the notion that, given an encrypted message (or ciphertext) from a perfectly secure encryption system (or cipher), absolutely nothing will be revealed about the unencrypted message (or plaintext) by the ciphertext i.e. perfect secrecy could be defined as having an absolute immunity to Cipher text only attacks. <br>

How One Time Pad has perfect secrecy?
>Shannon proved in 1949 that One Time Pad has perfect secrecy due to uniform probability distribution of all mssgs (of same length) that could have resulted in the cipher text c (that adversary might have hold to) with the key k (of same length). That means, given a cipher text c (generated using a key k of length n) the probability of all the possible message, of length n, to be encrypted to that c is uniform/equal. Hence, c is equally probable of being an encryption of $m_{1}$, $m_{2}$ ...$m_{xyz}$. While there exist only single key that maps m to c (given as k = m XOR c).

**Pitfalls:** 
- One Time Pad though have a perfect secrecy but it still is largely impractical to implement (because of the length of key being equal to that of the message, and if we even found a way to secretly transfer message-long key then it would be a better approach to use that transfer mechanism to secretly transfer the message at the first place).
- Thought it is secure to Cipher Text only attacks, but it is vulnerable to other forms of attacks.

**Bad News:** Shannon who proved that the OTP has the perfect secrecy later proved a theorem that states: "To have perfect secrecy the key size must always be greater than or equal to the size of the message". Hence, ciphers that use keys smaller in size than the messages don't have perfect secrecy.

### Stream Ciphers

**Question:** How to make a cipher that, if not have a perfect secrecy but still provides acceptable levels of security (i.e. how to make OTP practical)?
>**Solution:** Replace the random keys (of length equal or greater than the message) with psuedorandom keys (smaller length keys). These psuedorandom keys are generated using a **PRG (Psuedo Random Generator)** which is basically a function that takes in a seed space (initial key space, say $[0,1]^{s}$) and expand it to $[0,1]^{n}$ such that n>>s. And it is out of this inflated space a key is chosen at random. Ciphers using PRG are referred to as Stream Ciphers and they also maintain the aspect of using the psuedorandom key only to pad one mssg and not multiple messages.

![](./Images/PRG.png "Generator  G expands the key space and chose a key, which is psuedorandom in nature, at random from it to XOR with the message")
Note: PRG should be efficiently computable by a deterministic algorithm. The key, from the space G(k), XORed with mssg is a psuedorandom pad and not a truly random pad.

**Stream Cipher:** A stream cipher is a symmetric key cipher where plaintext digits are combined with a pseudorandom cipher digit stream (keystream). In a stream cipher, each plaintext digit is encrypted one at a time with the corresponding digit of the keystream, to give a digit of the ciphertext stream. Since encryption of each digit is dependent on the current state of the cipher, it is also known as state cipher. In practice, a digit is typically a bit and the combining operation an exclusive-or (XOR).<br>
Note: Stream ciphers convert one symbol of plaintext directly into a symbol of ciphertext.

**Redefining Security:**<br>
**Stream ciphers cannot have perfect secrecy!!** Therefore, we need to rethink how we define security as perfect secrecy is not practically feasible. So:
- Need a different definition of security
- Security will depend on specific PRG >> [Security Definition](#PRG-Security-Definition)

**Fundamental Requirement to secure Stream Ciphers:**<br>
A minimal property that a psuedo random generator must have is property of being unpredictable, i.e., **PRG must be unpredictable**. Therefore, for a stream cipher to be secure, at it's minimum, the PRG it uses must be unpredictable in nature.

What it mean to be unpredictable for a generator is that given first few bits of the output of the generator (which is the psuedorandom key), say 1...i bits, there is no efficient algorithm that can compute the rest, i+1...n, bits of the stream.

**Attacks on One Time Pad/Stream ciphers:**
- Attack 1: **Two time pad is insecure**, i.e., if we used the same key (or psuedorandom key) to pad two different messages(m1 and m2) and produce c1 and c2 cipher texts then for an adversary who captured both of those cipher texts, it's fairly easy to recover both m1 and m2 using the CT only attack. Hence a Stream Cipher key or a One Time Pad key should never, never ever, be used more than once.
![](./Images/TwoTimePad.png)
<br>
- Attack 2: One Time Pad or the Stream Ciphers in general provides **no integrity** at all (all they do is try to provide confidentiality when the key is only used once) and therefore are referred as malleables.
![](./Images/MalleableOTP.png)

Real World Examples:<br>
- **Old Stream Ciphers:** RC4, CSS etc.
- **Modern Stream Ciphers:** eStream project (qualify 5 Stream ciphers), Modern stream cipher in addition to seed uses nonce which is a non-repeating value for a given key. Hence, we can reuse a key because the nonce make the (k, r) pair unique.
![](./Images/ModernStreamCiphers.png)

#### PRG Security Definition
Recommended Watch: [PRG Security](https://www.coursera.org/learn/crypto/lecture/De10M/prg-security-definitions)

Security of a Stream Cipher depends on how secure is the Psuedo Random Generator it uses is. In turn the the PRG is regarded as secure if the output of the PRG is **indistinguishable from the truly random output.** That is, the distribution of pseudo random is indistinguishable from a truly (random) uniform distribution.

![](./Images/IndistinguishablePRG.png)
>Goal: To show that the psuedorandom output G(k), where k is a random variable from (seed)K = {0, 1}$^{s}$ and G(k) is the psuedorandom output from the expanded space, {0 , 1}$^n$, of the seed is indistinguishable from truly random r selected from a key space of {0, 1}$^n$ (not an expansion space).

How to show this indistinguishability from random? : Using **Statistical Test**<br>

**Statistical Test:**
Let's define what is a Statistical test on space {0, 1}$^{n}$:<br>
It's basically an algorithm (A) such that:
- A takes and input x (which is an n bit string) and 
- Outputs 0 (means input don't seem random) and 1 (means the input seems to be random)

One can think of any number of statistical tests, therefore, while considering indistinguishability we only account for efficient statistical tests.

A statistical test uses the concept of Advantage over a PRG to determine that whether it could distinguish the psuedorandom input from a truly random or not. Following image shows the formulation for calculation of Advantage of a given Statistical test A over a generator PRG.
![](./Images/Advantage-ST.png "G(k) is psuedorandom and r is truly random; Pr abbriviation for probability")
Note: We only want to consider the advantage of efficient statistical test for the generator PRG (we don't give a damn about inefficient ones). Also, we want the advantage to be negligible, i.e., a close to zero as possible (which indicates the statistical test wasn't able to distinguish).

Hence, crypto definition for a secure PRG is as follows:
![](./Images/SecurePRGs.png)
Note: Efficient algo (statistical test) theoretical means that finishes in polynomial time and practically could be regarded as one which finishes in a given time.

**Secure PRG is an unpredictable generator and vice versa**

A secure PRG implies: It's unpredictable (which covers the minimal requirement for a secure PRG). 
![](./Images/SecureMeanUnpredict.png)

Also, there exist a theorem that proved that: an unpredictable generator is secure in nature.
![](./Images/UnpredictMeanSecure.png)

**General Representation:**
![](./Images/GeneralSecure.png)
