Metadatalysis

Simple python tool to extract and analyze metadata in a set of documents. Relies on exiftool to get metadata, allows to extract metadata in images for pdf, docx, pptx and xlsx. Tested on Linux only.

Usage

usage: metadatalysis.py [-h] [--output OUTPUT] [--level {all,useful,sensitive}] [--display {all,useful,file,none}] [--children]
                        [--summary SUMMARY]
                        PATH

Process some files to extract metadata

positional arguments:
  PATH                  Folder or file path

options:
  -h, --help            show this help message and exit
  --output, -o OUTPUT   Store data in output file
  --level, -l {all,useful,sensitive}
                        How much metadata do you want?
  --display, -d {all,useful,file,none}
                        What do you want displayed?
  --children, -c        Tries to parse metadata of files within files
  --summary, -s SUMMARY
                        Generate and dumps a summary of the data in a given file

Example: python metadatanlysis.py FOLDER -c -l all -d file -o metadata.csv -o metadata.json

Issues

If you encounter the issue of having too many files opened, just ulimit -Sn 10000

Limitations

Images are extracted from only pdf, xlsx, pptx and docx documents
PDF analysis is using pypdf which is slow and sometimes crashes

Similar Projects

MetaDetective does a similar thing
metagoofil allows to download files to check for metadata
mat2 allows to remove sensitive metadata from files

License

Published under MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
metadata_allowlist.txt		metadata_allowlist.txt
metadata_sensitivelist.txt		metadata_sensitivelist.txt
metadatalysis.py		metadatalysis.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metadatalysis

Usage

Issues

Limitations

Similar Projects

License

About

Uh oh!

Releases

Packages

Languages

License

Te-k/metadatalysis

Folders and files

Latest commit

History

Repository files navigation

Metadatalysis

Usage

Issues

Limitations

Similar Projects

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages