Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

passwords and glob ffmpeg in win #15

Closed
wants to merge 8 commits into from

Conversation

dobrosketchkun
Copy link
Contributor

Implemented a simple password encoding and zip compression. Also, ffmpeg "glob" pattern doesn't work in Windows, so I tried to make a workaround.
Plus, I incorporated #14 by https://github.com/Theelgirl into this.

Implemented a simple password encodiing and zip compression. Also, "glob" pattern doesn't work in Windows, so I tried to make a workarond it.
Plus I incorporated AlfredoSequeida#14 from https://github.com/Theelgirl into this.
@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

That's super interesting! Did you make any benchmarks for speed?

@dobrosketchkun
Copy link
Contributor Author

Unfortunatly I didn't, but when I treid original fdiv, and with your modifications, the latter was few seconds faster on file around 1.5 mb (pdf)

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

Also, when I tested it, it said "unrecognized argument: test" when I passed -p "test". I see that passing -p alone makes it ask you for the password. I suggest that you make it let you pass the password in the command line by removing the action parameter, and if you do that, then it doesn't ask for your input.

Edit: A short help description may be nice too even if not needed, for consistency since all the other arguments have a help description.

@dobrosketchkun
Copy link
Contributor Author

dobrosketchkun commented Oct 9, 2020

Do you mean just type your password as plain text? It's not very secure.

A short help description may be nice

I forgot about it, thnx

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

Yeah I meant type it as plain text. If it's encrypted by the program, what's the matter? Sorry if this is dumb, I'm not very versed in password security as I've never made programs that have user accounts and/or passwords before.

Edit: There's a stack overflow question about how to use getpass with argparse in the command line, I personally like this answer:
https://stackoverflow.com/a/44416389

@dobrosketchkun
Copy link
Contributor Author

It's not dumb, just not paranoid enough for my taste. You see, you are right; it's on your PC, but there are cashe of commands and someone can just see what you are typing an screen and that kind of stuff.

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

Yeah that's fair. My bash_history file is hidden in Ubuntu 20.04, but I can still see it and technically grab the plaintext password from there. However, the vast majority of users will probably be on Windows, and I don't believe there's an equivalent of bash_history for Windows.

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

Here are the lines I changed to make it use getpass on the plaintext flags. I took out the functions that I didn't change. If you'd like, I can add this to my optimization commit.

class Password:
    DEFAULT = 'False'

    def __init__(self, value):
        if value == self.DEFAULT:
            value = getpass.getpass('Enter Password (press enter to skip): ')
        self.value = value

    def __str__(self):
        return self.value

    def __bool__(self):
        return True

def get_password(pwd=False):
    password_provided = pwd
    if not pwd:
        password_provided = getpass.getpass("Enter password:")
    password = str(password_provided).encode()  
    salt = os.urandom(32)
    kdf = PBKDF2HMAC(
        algorithm=hashes.SHA512(),
        length=32,
        salt=salt,
        iterations=100000,
        backend=default_backend()
        )
    key = base64.urlsafe_b64encode(kdf.derive(password)) 
    return key

def main():
    parser = argparse.ArgumentParser(description="save files as videos", formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument(
        "-e", "--encode", help="encode file as video", action="store_true"
    )
    parser.add_argument(
        "-d", "--decode", help="decode file from video", action="store_true"
    )

    parser.add_argument("-i", "--input", help="input file", required=True)
    parser.add_argument("-o", "--output", help="output path")
    parser.add_argument("-f", "--framerate", help="set framerate for encoding (as a fraction)", default="1/5", type=str)
    parser.add_argument("-p", "--password", help="set password", nargs="?", type=Password, default=Password.DEFAULT)
    args = parser.parse_args()

    setup()

    if args.decode:
        if args.password != "":
            key = get_password(args.password)
        bits = get_bits_from_video(args.input)

        file_path = None

        if args.output:
            file_path = args.output
            
        if args.password:
            save_bits_to_file_crypto(file_path, bits, key)
        else:
            save_bits_to_file(file_path, bits)

    elif args.encode:
        # isdigit has the benefit of being True and raising an error if the user passes a negative string
        # all() lets us check if both the negative sign and forward slash are in the string, to prevent negative fractions
        if (not args.framerate.isdigit() and "/" not in args.framerate) or all(x in args.framerate for x in ("-", "/")):
            raise NotImplementedError("The framerate must be a positive fraction or an integer for now, like 3, '1/3', or '1/5'!")
        # get bits from file
        if args.password != "":
            key = get_password(args.password)
            bits = get_bits_from_file_crypto(args.input, key)
        else:
            bits = get_bits_from_file(args.input)

        # create image sequence
        image_sequence = make_image_sequence(bits)

        # save images
        for index in range(len(image_sequence)):
            image_sequence[index].save(
                f"{FRAMES_DIR}encoded_frames_{index}.png"
            )

        video_file_path = None

        if args.output:
            video_file_path = args.output

        make_video(video_file_path, image_sequence, args.framerate)

    cleanup()

@dobrosketchkun
Copy link
Contributor Author

It looks like a nice compromise!

If you'd like, I can add this to my optimization commit.

It'll be nice as long as it'll not be confusing for AlfredoSequeida and if it is tested with full pasword code modifications.

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

I tested it with ascii and utf-8 passwords, with an empty password flag, and with no password flag. They all work, as long as the encoding and decoding passwords are the same. However, this version requires a password every time, to I added "Press Enter to skip", which basically makes the password the enter key.

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

I'm getting this when I try to decrypt videos generated with a password, do you know what could cause it? Are you intending to gzip files by default?
gzip.BadGzipFile: Not a gzipped file (b'\xff\xd8')

Also, when I enter a password for decryption and didn't enter one for encryption, it gives me a file back? Shouldn't it raise an error? Never mind, it's because I made another modification that I forgot about to prevent the gzip error by sending the file to the normal save_bits_from_file instead of the crypto version when possible.

@dobrosketchkun
Copy link
Contributor Author

Here another modification I want to add in order to get rid of Magic module, which is kind of not user friendly on win:

import pickle

def get_bits_from_file_crypto(filepath, key):
    bitarray = BitArray(filename=filepath)
    bitarray.append(DELIMITER)
    message = pickle.dumps({'filename': filepath, 'data' : str(bitarray.bin)}) # <--------------------
    # message = str(bitarray.bin).encode()
    f = Fernet(key)
    encrypted = f.encrypt(message)
    #zip
    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode='w') as fo:
        fo.write(encrypted)
    encrypted_zip = out.getvalue()
    #zip
    
    
    bitarray2 = BitArray(encrypted_zip)
    print('Bits are in place')
    return bitarray2.bin

# <....>

def save_bits_to_file_crypto(file_path, bits, key):
    bitstring_temp = Bits(bin=bits)
    encrypted = bitstring_temp.tobytes()

    #zip
    in_ = io.BytesIO()
    in_.write(encrypted)
    in_.seek(0)
    with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
        encrypted = fo.read()
    #zip

    f = Fernet(key)
    decrypted_bits = f.decrypt(encrypted)#.decode()
    _dict = pickle.loads(decrypted_bits) # <------------------------------------------------------------
    filename = _dict['filename']
    decrypted_bits_with_tail = _dict['data']
    
    bitstring_with_tail = Bits(bin=decrypted_bits_with_tail)
    bitstring_with_tail = bitstring_with_tail.bin
    # print('decoded_bitstring', bitstring_with_tail)
    delimiter_str = DELIMITER.replace("0b", "")
    delimiter_length = len(delimiter_str)

    if bitstring_with_tail[-delimiter_length:] == delimiter_str:
        bitstring_with_tail = bitstring_with_tail[: len(bitstring_with_tail) - delimiter_length]

    bitstring = Bits(bin=bitstring_with_tail)
    
    # mime = Magic(mime=True)
    # mime_type = mime.from_buffer(bitstring.tobytes())

    if file_path == None:
        filepath = filename
    else:
        filepath = file_path

    with open(
        filepath, "wb"
    ) as f:
        bitstring.tofile(f)

#<...>

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

Why are you using gzip? Compression will lose us bits, right? I had to add a modification to save_bits_to_file_crypto to make it raise an error on invalid passwords, but now it raises an error when the correct password is entered also, so I want to avoid that:

class WrongPassword(Exception):
    pass

def save_bits_to_file_crypto(file_path, bits, key):
    bitstring_temp = Bits(bin=bits)
    encrypted = bitstring_temp.tobytes()

    #zip
    in_ = io.BytesIO()
    in_.write(encrypted)
    in_.seek(0)
    if file_path is None:
        bitstring = Bits(bin=bits)
        mime = Magic(mime=True)
        mime_type = mime.from_buffer(bitstring.tobytes())
        file_path = f"file{mimetypes.guess_extension(type=mime_type)}"
    with open(file_path, 'rb') as fo:
        encrypted = fo.read()
    #zip

    f = Fernet(key)
    try:
        decrypted_bits = f.decrypt(encrypted).decode()
    except cryptography.Fernet.InvalidToken:
        raise WrongPassword("That's not the password used to encrypt the file!")
    bitstring_with_tail = Bits(bin=decrypted_bits)
    bitstring_with_tail = bitstring_with_tail.bin
    # print('decoded_bitstring', bitstring_with_tail)
    delimiter_str = DELIMITER.replace("0b", "")
    delimiter_length = len(delimiter_str)

    if bitstring_with_tail[-delimiter_length:] == delimiter_str:
        bitstring_with_tail = bitstring_with_tail[: len(bitstring_with_tail) - delimiter_length]

    bitstring = Bits(bin=bitstring_with_tail)
    
    mime = Magic(mime=True)
    mime_type = mime.from_buffer(bitstring.tobytes())

    if file_path == None:
        filepath = f"file{mimetypes.guess_extension(type=mime_type)}"
    else:
        filepath = file_path

    with open(
        filepath, "wb"
    ) as f:
        bitstring.tofile(f)

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

Ah, something is up with generating the key. An example key generated with encoding the password "test" is b'2v99r7msWq2ZsLM27WS_LxVmzd5rfzmOKiMcbKgA_z4='
and the key that it tries to get from the image is
b'MxhcM4dL6HLmJSLRnMWyFJWqShIq6gsqT6u3wNNyuj0='

Edit:
It's the urandom salt in get_password. Making the salt static makes the keys the same. However, InvalidToken is still raised.

@dobrosketchkun
Copy link
Contributor Author

Why are you using gzip?

I use gzip since the encoding algorithm transforms a 1.5 mb file into 60 mb video without gzip and into 45 mb video with it and bigger the video longer the decoding.

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

Yes, but is gzip lossless? Will we lose any bits by using gzip?

@dobrosketchkun
Copy link
Contributor Author

Well, it should be:

https://www.gzip.org/
https://zlib.net/

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

Ok, well either way, when I try to decode it using your code in the most recent comment, I get this error:

  File "./fvid.py", line 353, in main
    save_bits_to_file_crypto(file_path, bits, key)
  File "./fvid.py", line 211, in save_bits_to_file_crypto
    encrypted = fo.read()
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'\xff\xd8')```

@dobrosketchkun
Copy link
Contributor Author

dobrosketchkun commented Oct 9, 2020

Well, since your implementation of the password dialogue is better than mine, but it doesn't seem compatible with compression/decompression, I guess we just get rid of gzip for now. The question is where it's better to be, in my push or yours?

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

Both of our creations are bugged with gzip, and I have no clue how to debug the gzip as I only understand part of what you did. What I'd do if I were you would be to figure out the error, fix it, and patch the solution to this branch (I can't, because I don't fully understand the crypto stuff you did).

So, to answer your question, since it's not working for either of us, keep it in this push until you can fix it.

gzip doesn't work properly with some of future modifications to password cryptography and the Magic module is just not needed.
@dobrosketchkun
Copy link
Contributor Author

Commented gzip out for now and the magic module too. Also, I did some minor tweaks with non-crypto variants of functions.

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

Getting an InvalidToken error with any password now.

File "./fvid.py", line 360, in main
save_bits_to_file_crypto(file_path, bits, key)
File "./fvid.py", line 221, in save_bits_to_file_crypto
decrypted_bits = f.decrypt(encrypted)#.decode()
File "/root/.pyenv/versions/3.8.3/lib/python3.8/site-packages/cryptography/fernet.py", line 74, in decrypt
timestamp, data = Fernet._get_unverified_token_data(token)
File "/root/.pyenv/versions/3.8.3/lib/python3.8/site-packages/cryptography/fernet.py", line 92, in _get_unverified_token_data
raise InvalidToken
cryptography.fernet.InvalidToken

Also, instead of separate functions for crypto and non-crypto versions, how about adding a boolean crypto argument to the non-crypto version and removing the crypto? It could result in a lot less code if done right.

@Theelx
Copy link
Collaborator

Theelx commented Oct 9, 2020

The modifications you made to the non-crypto files resulted in a lot of ffmpeg dianostics cluttering the screen and a file size 10x bigger. Do you know what changes could be causing this?

@AlfredoSequeida
Copy link
Owner

AlfredoSequeida commented Oct 9, 2020

@dobrosketchkun This is awesome! I just read through the conversation. Let me digest this and I will get back to you!

Copy link
Owner

@AlfredoSequeida AlfredoSequeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That decryption issue has to be fixed before I can test my other assumptions:

To make sure we are on the same page lets use this file for testing:
https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png

Other than that, this seems like it will be a really cool feature, good job!

fvid/fvid.py Outdated Show resolved Hide resolved
fvid/fvid.py Outdated
delimiter_str = DELIMITER.replace("0b", "")
delimiter_length = len(delimiter_str)

if bitstring_with_tail[-delimiter_length:] == delimiter_str:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work, which is the same as what is on line 184, but because of the error on 218, I haven't tested it.

The reason is that when get_bits_from_video is called with the data has been encrypted; what ends up happening is that the delimiter is never found and therefore if you see the output to the fvid_frames directory, we might actually have duplicate frames. ultimately, when save_bits_to_file_crypto is called there is actually more data in there that the file originally had since all the frames (including any duplicate frames) will become part of the data.

For reference purposes, here is the file I used to test it: https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png

The program ran for me for 20 frames (as indicated by the number of loading bars when getting frames), but if you stop the program from decoding when that is happening you will find that in the fvid_frames directory there actually 12 frames that contain the data (indicated by that frame #12 has data only part of the frame and the rest is black):

decoded_frames012

So again, if we are getting the data for all 20 frames, then even if we try to find the delimiter, we might have extra data.

but it's hard to say for sure without being able to decode the data from the issue I am getting. So if you fix that we should be able to see if I am correct or not.

@dobrosketchkun
Copy link
Contributor Author

dobrosketchkun commented Oct 10, 2020

First of all, I don't really know well pull section of Github, so I may pushed a button, I don't need to push, lol.

Anyway, I rewrote all the code to better clarity.

I checked gzip losslessness by using this code:

import gzip
import io
import os
import hashlib
from tqdm import tqdm

same = 0
diff = 0
size = 100000
times = 1000000

for _ in tqdm(range(times)):
    random_message = os.urandom(size)
    hash_orig = hashlib.sha256()
    hash_orig.update(random_message)
    hash_orig = hash_orig.hexdigest()


    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode='w') as fo:
        fo.write(random_message)
    random_zip = out.getvalue()


    in_ = io.BytesIO()
    in_.write(random_zip)
    in_.seek(0)
    with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
        random_unzip = fo.read()

    hash_unzip = hashlib.sha256()
    hash_unzip.update(random_unzip)
    hash_unzip = hash_unzip.hexdigest()


    if hash_orig == hash_unzip:
        same += 1
    elif hash_orig != hash_unzip:
        diff += 1
    else:
        print('WTF', hash_orig, hash_unzip)

print({'times' : times,
        'size' : size,
        'same' : same,
        'diff' : diff
})

result:

>py gzip_test.py
100%|█████████████████████████████████████████████████████████████████████| 1000000/1000000 [1:57:47<00:00, 141.49it/s]
{'times': 1000000, 'size': 100000, 'same': 1000000, 'diff': 0}

So, I'm pretty sure it's safe to say that python gzip is lossless.

Here changes in functions I made to the original variant of the code:

make_video() - "glob" pattern doesn't work in Win, made a workaround
get_password() from Theelgirl - salt needed to be the same in one instance of coding/decoding
get_bits_from_image() - with my modifications doesn't require to find DELIMITER
save_bits_to_file() - new way to find a DELIMITER after decrypting
save_bits_to_file() - filenames without funky mime magic

So, basically, the filename is contained in the pickle structure, along with encrypted data and cryptographic tag. If you don't use "-p" flag you are really using a password anyway, default one.

I checked this code on Windows 10 (python 3.7.4) and Arch (python 3.8.6)

@Theelx
Copy link
Collaborator

Theelx commented Oct 10, 2020

I still get this when decoding mp4s containing an encoded Lenna without using the -p flag on Ubuntu 20.04, Python 3.8.3:

  File "./fvid.py", line 331, in main
    save_bits_to_file(file_path, bits, key)
  File "./fvid.py", line 179, in save_bits_to_file
    bitstring = fo.read()
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'LG')

In addition, it takes upwards of 55 seconds to read the Lenna mp4 with ffmpeg, which is a serious performance regression, even worse than it was in the original version without any optimizations. It takes 30 seconds to read an encoded mp4 of one of my jpg files, where it took under a second without gzip/cryptography, so there's a huge overhead in ffmpeg processing the gzip files.

Edit: It seems to have been erroring and taking a long time because the fvid_frames directory was still using frames from Lenna's test, it works well for me now.

Edit 2: I don't know how I got it to work previously. I just did it again on Lenna, and it put 300 files taking up 600MB of disk in the fvid_frames folder before running the decoder and giving me the traceback pasted earlier. Here are some diagnostics:
Screenshot (553)
Screenshot (552)

- FFmpeg python is gone.
- make_image_sequence() implements new logic to avoid memory crashes with big files.
- minor improvements to get_password
- various garbage collection improvements
@AlfredoSequeida
Copy link
Owner

@dobrosketchkun I like the idea of compressing the data assuming we can get the original data back, of course, it makes a lot of fo sense. As soon as I have some time I will test your changes. Thank you!

@Theelx
Copy link
Collaborator

Theelx commented Oct 11, 2020

I can vouch for them working on Ubuntu and Windows if that helps

@dobrosketchkun
Copy link
Contributor Author

@AlfredoSequeida btw, in a theoretical situation, if you want to keep only one thing of all of this, it need to be password encryption, because one cannot just put their stuff in public places without protection.

@dobrosketchkun
Copy link
Contributor Author

@Theelgirl @AlfredoSequeida I think I found a way to reduce the time from file encoding drastically. Let's say we are talking about this file - https://archive.org/download/LowEndCo1985/LowEndCo1985_64kb.mp4 because Lenna.png is too small to see the difference. So, my current code with some optimizations upon original encodes it in 1h 5min on my Win10 machine, but with a new approach, it only takes 3min 13s! (decoding in both variants is around 17 mins)

How it's possible
Main bottleneck in original approach is this:

bit_sequence = split_list_by_n(list(map(int, bitstring)), width * height)

You need to bite every string 1 or 0 and make it int. It takes ages, put PIL understands only bytes. I was thinking about it and suddenly remembered about the existence of a text-based image format - ppm p3 version. Example of a small file from the spec:

P3
# feep.ppm
4 4
15
 0  0  0    0  0  0    0  0  0   15  0 15
 0  0  0    0 15  7    0  0  0    0  0  0
 0  0  0    0  0  0    0 15  7    0  0  0
15  0 15    0  0  0    0  0  0    0  0  0

To make a p3 ppm file, you need a magic phrase - "P3", then w and h on new lin, and on another line with maximum color value (arbitrary up to 65536 and bigger than 0). After that, you need to put lines of pixels in R G B R G B R G B ... format, not very sofisticated.

Exactly that new make_image_sequence() is doing.

The only con is it's quite pricy on temporary files volume - this file's takes around 1.5 Gb (300 for png variant)

PS
I also figured out (thanks to @zavok that with gzip, you don't really need a delimiter; gzip cut in by itself.

The code:

from bitstring import Bits, BitArray
from PIL import Image
import glob

from operator import sub
import numpy as np
from tqdm import tqdm


import binascii

import argparse
import sys
import os

import getpass 

import io
import gzip
import pickle

from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
from Crypto.Cipher import AES

from time import time

#DELIMITER = bin(int.from_bytes("HELLO MY NAME IS ALFREDO".encode(), "big"))
FRAMES_DIR = "./fvid_frames/"
SALT = '63929291bca3c602de64352a4d4bfe69'.encode()  # It need be the same in one instance of coding/decoding
DEFAULT_KEY = ' '*32
DEFAULT_KEY = DEFAULT_KEY.encode()
NOTDEBUG = True

class WrongPassword(Exception):
    pass

class MissingArgument(Exception):
    pass

def get_password(password_provided):
    if password_provided=='default':
        return DEFAULT_KEY
    else:
        if password_provided == None:
            password_provided = getpass.getpass("Enter password:")

        password = str(password_provided).encode()  
        kdf = PBKDF2HMAC(
            algorithm=hashes.SHA512(),
            length=32,
            salt=SALT,
            iterations=100000,
            backend=default_backend()
            )
        key = kdf.derive(password)
        return key



def get_bits_from_file(filepath, key):
    print('Reading file...')
    bitarray = BitArray(filename=filepath)
    # adding a delimiter to know when the file ends to avoid corrupted files
    # when retrieving
    # bitarray.append(DELIMITER)

    cipher = AES.new(key, AES.MODE_EAX, nonce=SALT)
    ciphertext, tag = cipher.encrypt_and_digest(bitarray.tobytes())
    
    filename = os.path.basename(filepath)
    pickled = pickle.dumps({'tag':tag,
                            'data':ciphertext,
                            'filename':filepath})
    print('Ziping...')
    #zip
    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode='w') as fo:
        fo.write(pickled)
    zip = out.getvalue()
    #zip
    
    del bitarray
    del pickled

    bitarray = BitArray(zip)
    return bitarray.bin

def less(val1, val2):
    return val1 < val2

def get_bits_from_image(image):
    width, height = image.size

    done = False

    px = image.load()
    bits = ""

    pbar = tqdm(range(height), desc="Getting bits from frame")

    white = (255, 255, 255)
    black = (0, 0, 0)
    
    for y in pbar:
        for x in range(width):

            pixel = px[x, y]

            pixel_bin_rep = "0"

            # for exact matches
            if pixel == white:
                pixel_bin_rep = "1"
            elif pixel == black:
                pixel_bin_rep = "0"
            else:
                white_diff = tuple(map(abs, map(sub, white, pixel)))
                # min_diff = white_diff
                black_diff = tuple(map(abs, map(sub, black, pixel)))


                # if the white difference is smaller, that means the pixel is closer
                # to white, otherwise, the pixel must be black
                if all(map(less, white_diff, black_diff)):
                    pixel_bin_rep = "1"
                else:
                    pixel_bin_rep = "0"

            # adding bits
            bits += pixel_bin_rep

    return (bits, done)


def get_bits_from_video(video_filepath):
    # get image sequence from video
    print('Reading video...')
    image_sequence = []

    os.system('ffmpeg -i ' + video_filepath + ' ./fvid_frames/decoded_frames_%d.png');

    # for filename in glob.glob(f"{FRAMES_DIR}decoded_frames*.png"):
    for filename in sorted(glob.glob(f"{FRAMES_DIR}decoded_frames*.png"), key=os.path.getmtime) :
        image_sequence.append(Image.open(filename))

    bits = ""
    sequence_length = len(image_sequence)
    print('Bits are in place')
    for index in tqdm(range(sequence_length)):
        b, done = get_bits_from_image(image_sequence[index])

        bits += b

        if done:
            break

    return bits


def save_bits_to_file(file_path, bits, key):
    # get file extension

    bitstring = Bits(bin=bits)

    #zip
    print('Unziping...')
    in_ = io.BytesIO()
    in_.write(bitstring.bytes)
    in_.seek(0)
    with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
        bitstring = fo.read()
    #zip


    unpickled = pickle.loads(bitstring)
    tag = unpickled['tag']
    ciphertext = unpickled['data']
    filename = unpickled['filename']
    
    cipher = AES.new(key, AES.MODE_EAX, nonce=SALT)
    bitstring = cipher.decrypt(ciphertext)
    print('Checking integrity...')
    try:
     cipher.verify(tag)
     # print("The message is authentic")
    except ValueError:
     raise WrongPassword("Key incorrect or message corrupted")

    bitstring = BitArray(bitstring)

    
    # _tD = Bits(bin=DELIMITER) # New way to find a DELIMITER
    # _tD = _tD.tobytes()
    # _temp = list(bitstring.split(delimiter=_tD))
    # bitstring = _temp[0]


    # If filepath not passed in use defualt
    #    otherwise used passed in filepath
    if file_path == None:
        filepath = filename
    else:
        filepath = file_path # No need for mime Magic

    with open(
        filepath, "wb"
    ) as f:
        bitstring.tofile(f)



def split_list_by_n(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i : i + n]

def pix(bin):
    if bin == '1':
        return '255'
    else:
        return '0'


def make_image_sequence(bitstring, resolution=(1920, 1080)):
    width, height = resolution
    maxval = 1
    # split bits into sets of width*height to make (1) image
    set_size = width * height

    # bit_sequence = []
    print('Making image sequence')
    print('Cutting...')
    bitlist = list(tqdm(split_list_by_n(bitstring, set_size)))
    
    del bitstring
    
    bitlist[-1] = bitlist[-1] + '0'*(set_size - len(bitlist[-1]))

    bitlist = bitlist[::-1]
    ppm_header = f'P3 \n{width} {height} \n{maxval}\n'
    
    
    print('Saving frames...')
    for index in tqdm(range(len(bitlist))):
        bitl = bitlist.pop()
        # print('bitl', bitl)
        bitl = list(split_list_by_n(bitl, width))
        bitl = [' '.join([' '.join([_]*3) for _ in list(row)]) for row in bitl]
        image = ppm_header + '\n'.join(bitl)
        path = f"{FRAMES_DIR}encoded_frames_{index+1}.ppm"
        with open(path, 'w') as f:
            f.write(image)


def make_video(output_filepath, framerate="1/5"):

    if output_filepath == None:
        outputfile = "file.mp4"
    else:
        outputfile = output_filepath

    os.system('ffmpeg -r ' + framerate + ' -i ./fvid_frames/encoded_frames_%d.ppm -c:v libx264rgb ' + outputfile)



def cleanup():
    # remove frames
    import shutil

    shutil.rmtree(FRAMES_DIR)


def setup():
    import os

    if not os.path.exists(FRAMES_DIR):
        os.makedirs(FRAMES_DIR)


def main():
    parser = argparse.ArgumentParser(description="save files as videos")
    parser.add_argument(
        "-e", "--encode", help="encode file as video", action="store_true"
    )
    parser.add_argument(
        "-d", "--decode", help="decode file from video", action="store_true"
    )

    parser.add_argument("-i", "--input", help="input file", required=True)
    parser.add_argument("-o", "--output", help="output path")
    parser.add_argument("-f", "--framerate", help="set framerate for encoding (as a fraction)", default="1/5", type=str)
    parser.add_argument("-p", "--password", help="set password", nargs="?", type=str, default='default')

    args = parser.parse_args()

    setup()
    # print(args)
    # print('PASSWORD', args.password, [len(args.password) if len(args.password) is not None else None for _ in range(0)])
    
    if not args.decode and not args.encode:
        raise   MissingArgument('You should use either --encode or --decode!') #check for arguments
    
    key = get_password(args.password)
    
    if args.decode:
        bits = get_bits_from_video(args.input)

        file_path = None

        if args.output:
            file_path = args.output

        save_bits_to_file(file_path, bits, key)

    elif args.encode:
        # isdigit has the benefit of being True and raising an error if the user passes a negative string
        # all() lets us check if both the negative sign and forward slash are in the string, to prevent negative fractions
        if (not args.framerate.isdigit() and "/" not in args.framerate) or all(x in args.framerate for x in ("-", "/")):
            raise NotImplementedError("The framerate must be a positive fraction or an integer for now, like 3, '1/3', or '1/5'!")
        # get bits from file
        bits = get_bits_from_file(args.input, key)

        # create image sequence
        make_image_sequence(bits)

        # save images
        # for index in range(len(image_sequence)):
            # image_sequence[index].save(
                # f"{FRAMES_DIR}encoded_frames_{index}.png"
            # )

        video_file_path = None

        if args.output:
            video_file_path = args.output

        make_video(video_file_path, args.framerate)
    
    # cleanup()


time1 = time()
main()
print('Time: ', time() - time1)

@dobrosketchkun dobrosketchkun mentioned this pull request Oct 12, 2020
@AlfredoSequeida
Copy link
Owner

@dobrosketchkun right now I am going through the PR requests, has that decoding part been fixed yet? I would love to test it. Also I agree - for public platforms, password encryption is a must.

Drastically reducing time for encoding
@dobrosketchkun
Copy link
Contributor Author

@AlfredoSequeida try the last variant, it's, as I said, super-fast with bigger files.

@Theelx
Copy link
Collaborator

Theelx commented Oct 13, 2020

@dobrosketchkun I tested with a 1.2mb jpg on Ubuntu 20.04, as the file you suggested to test wouldn't load on my computer. It takes 8 seconds to encode with the crypto program, and 6 seconds with your version. However, your version takes up 60MB in fvid_frames, while the crypto one takes 1.3MB. While in this test yours is slightly faster, it uses 50x more disk. Because I figured hey, this file is only 1.2MB so maybe your program works best on larger files, I tested a 26MB jpg (can't upload because images above 10MB aren't allowed, http://eoimages.gsfc.nasa.gov/images/imagerecords/73000/73751/world.topo.bathy.200407.3x21600x10800.jpg). Your program took 147 seconds to run, and took up 1.3GB of disk, while the crypto version took 258 seconds to run, and 27MB of disk. But here's where it gets interesting. By removing TQDM, the progress bar, the crypto version actually takes only 76.8 seconds to run, compared to 102 seconds for your version, and both use the same amount of memory.

In conclusion, according to my tests, the big speed bottleneck in large image processing is not the algorithm used, it's the progress bar. By removing the progress bar, your crypto version is actually significantly faster than, and uses less memory than, the PPM P3 on large images.

example

@Theelx
Copy link
Collaborator

Theelx commented Oct 13, 2020

Side note, to keep some sort of progress bar showing:
By removing all the tqdm calls excepting this line in make_image_sequence:

    for _ in tqdm(range(len(bitlist))):

I was able to speed up the crypto version's encoding by 3x, and the PPM P3 version was only sped up by about 50%. This made the crypto version nearly twice as fast, and using under 2% of the disk space, as the PPM P3 version on my Ubuntu 20.04 system on a single core of my 4-core 3.8GHz Ryzen processor.

Welp, the previous variant was better, but without tqdm, so here we are
@dobrosketchkun
Copy link
Contributor Author

Whoa~ Indeed it's the case, go figure. So I reverted to the previous variant, with new modifications about delimiter and tqdm.

@Theelgirl to finalize it, please, combine this encoding approach with yours, I assume, Cythonic decoding approach, and we will get a very fast thing.

you really need tqdm in this plase
@Theelx
Copy link
Collaborator

Theelx commented Oct 13, 2020

@dobrosketchkun Sounds good, I'll submit a new PR with both our approaches.
Edit: #21

@dobrosketchkun
Copy link
Contributor Author

@Theelgirl
So tl;dr my approach includes:

  • password encryption - you can don't use it (well, basically you're using a default one in this case), use by "-p your_password" or only with "-p" to enter with getpass()
  • zipping (side effect - you don't need delimiter)
  • pickling (filename, tag for checks and zipped data)
  • absence of magic, mime (filename extracted from a pickled dictionary or args)
  • new logic of make_image_sequence() - to help with bigger files and memory issues
  • absence of ffmpeg because it's unnecessary and because pattern_type="glob" is not supported on Win.

Maybe I forget about something, but it's most of it.

@Theelx
Copy link
Collaborator

Theelx commented Oct 13, 2020

Got it, I removed python-magic and ffmpeg from required imports in setup.py to adjust for that.

@dobrosketchkun
Copy link
Contributor Author

We combined our code with @Theelgirl in one pull #21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants