Skip to content

Tool for API-based acquisition of evidence from cloud storage providers

License

Notifications You must be signed in to change notification settings

andresebr/kumodd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kumodd

Kumodd downloads files and/or generates a CSV file of metadata from a specified Google Drive account in a forensically sound manner.

  • Files can be filtered by category, such as doc, image, or video.
  • Metadata columns may be selected in the configuration file.
  • Available Google Drive API metadata is preserved.
  • Last Modified file system time stamp is preserved and verified.
  • MD5 digest is preserved and verified.
  • File size is preserved and verified.

Usage examples

Both the list (-l) and download (-d) options create a CSV file and a text table on standard output.

List (-l) all documents:

kumodd.py -l doc
Created (UTC)            Last Modified (UTC)      Remote Path                   Revision   Modified by      Owner            MD5                       
2019-06-24T05:04:47.055Z 2019-06-24T05:41:17.095Z My Drive/Untitled document    3          Johe Doe         Johe Doe         -
2019-05-18T06:16:19.084Z 2019-05-18T06:52:49.972Z My Drive/notes.docx           1          Johe Doe         Johe Doe         1376e9bf5fb781c3e428356b4b9aa21c
2019-05-16T23:34:42.665Z 2019-05-17T22:18:07.705Z My Drive/Letter to John.docx  1          Johe Doe         Johe Doe         4cb0b987cb879d48f56e4fd2cfd57d83
2019-04-12T16:21:48.867Z 2019-04-12T16:21:55.245Z My Drive/Todo List            27         Johe Doe         Johe Doe         -                   

Download (-d) all documents to ./download (the default location).

kumodd.py -d doc

Download (-d) all PDF files to path (-p) /home/user/Desktop/:

kumodd.py -d pdf -p /home/user/Desktop/

By default, native Google Apps files (docs, sheets and slides) are downloaded in PDF format. To instead download them in LibreOffice format, use the '--nopdf' option.

By default, every available revision is downloaded unless --norevisions is specified, in which case only the current file (latest revision) is downloaded. Previous revisions are saved as filename_(revision id_last modified date).

To download all of the files listed in a previously generated CSV file, use:

kumodd.py -csv ./filelist-username.csv

Time Stamps

As a convenience, kumodd sets the time stamps of files that are exported. However, due to file system and kernel limitations, the only reliable file system timestamp is the Last Written time of exported files. Other file system time stamps on exported files are unreliable. For any analysis, time stamps should instead be taken directly from the preserved metadata (e.g. foo.doc.yml metadata for a given foo.doc).

Time stamps available in Google Drive generally includes the following:

  • createdDate
  • markedViewedByMeDate
  • modifiedByMeDate
  • modifiedDate

To set the timestamps in exported files, Kumod maps these values as follows:

  • Last Modified time = modifiedDate
  • Last Accessed time = markedViewedByMeDate
  • Created time = createdDate

Windows has all three; however, setting the Created time in python via the win32 API has proven unreliable. Certain more recent Unix file systems have a created time stamp, including Ext4, UFS2, Hammer, LFS, and ZFS (see Wikipedia Comparison of File Systems). However, the Linux kernel provides no method (i.g. system call or library) to read or write the Created time, so Created time is not available to kumodd on Linux. markedViewedByMeDate is not always available in Google Drive. The Last Accessed time stamps may be overwritten by subsequent reading of exported files.

In conclusion, file system time stamps on exported files should not be relied on for any analysis. Instead of file system time stamps, analysis should use the time stamps taken directly from the preserved metadata.

Metadata

Metadata provided by the Google Drive API is preserved in YAML format (see Example raw metadata). Files are stored in ./download and their corresponding metadata are saved in ./download/metadata. For foo.doc, the file and its metadata paths would be:

One can configure which columns are written to stdout and CSV files. They are specified by the tag 'csv_columns' in config/config.yml (see Configuration). The default CSV columns are:

CSV Columns Description
title File name
category one of: doc, xls, ppt, text, pdf, image, audio, video or other
modifiedDate Last Modified Time (UTC)
modTimeMatch 'match' if local and remote Last Modification times match, else MISMATCH.
md5Checksum MD5 digest of remote file. None if file is a native Google Apps Document.
md5Local md5 of download if new or updated. Otherwise None
md5Match 'match' if local and remote MD5s match, else time difference.
fileSize Number of bytes in file
sizeMatch 'match' if local and remote sizes match, else %local/remote.
revision Number of available revisions
ownerNames A list of owner user names
createdDate Created Time (UTC)
mimeType MIME file type
path File path in Google Drive
id Unique Google Drive File ID
lastModifyingUserName Last Modified by (user name)
modifiedByMeDate Time Last Modified by Account Holder (UTC)
lastViewedByMeDate Time Last Viewed by Account Holder (UTC)
shared Is shared (true/false)

Certain metadata are computed by Kmodd. These include catetory, path, local_path, md5local, md5Match, localSize, sizeMatch, modTimeMatch and revision. These names are not found in the data retrieved from google drive, but instead are computed from the metadata retrieved from Google Drive.

Computed Metadata Value
md5Match either 'match', 'MISMATCH' or 'n/a'. md5Match is 'n/a' when the MD5 digest is not available from Google Drive, including for native Google Apps files and certain PDFs.
sizeMatch either 'match' or a percentage ratio of local/remote file size.
modTimeMatch either 'match' or the time difference of Last Modified time in DAYS HH:MM:SS.
revision the number of revisions available in Google drive.

Note: The 'thumbnailLink' attribute is transient. Kumodd removes 'thumbnailLink' because it changes each time the metadata is retrieved from Google Drive, even if the file and other metadata have not changed. When 'thumbnailLink' is excluded, the metadata is reproducible (identical each time retrieved) if the file has not changed. This also improves time efficient review of changes in the YAML using 'diff'.

Metadata names are translated to CSV column titles using 'csv_title' in the configuration file (see Configuration). If a title is not defined there, the metadata name is used as the CSV column title.

Setup

To setup kumodd, install python and git, then install kumodd and requirements, obtain an Oauth ID required for Google API use, and finally, authorize access to the specified account.

  1. Install python 3 and git. Then download kumodd and install the dependencies.

    On Debian or Ubuntu:

    apt install python3 git
    git clone https://github.com/rich-murphey/kumodd.git
    cd kumodd
    python3 -m pip install --user -r requirements.txt
    ./kmodd.py --helpfull

    On Windows, one option is to use the Chocolatey package manager.

    cinst -y python git
    git clone https://github.com/rich-murphey/kumodd.git
    cd kumodd
    python -m pip install --user -r requirements.txt
    ./kmodd.py --helpfull
  2. Obtain a Google Oauth client ID (required for Google Drive API):

    1. Create a free google cloud account.
    2. Login to your Google cloud account.
    3. Create a Project.
    4. Create Oauth2 API credential for the project.
    5. Click "Create Credentials" and select "Oauth client ID".
    6. Select the radio button "Web Application".
    7. In "Authorized redirect URIs", enter: http://localhost:8080
    8. Click "create". Next, a dialog "OAuth client" will pop up.
    9. Click OK. Next, it will show the list of "Oauth 2.0 client IDs".
    10. Click the down arrow icon at far right of the new ID. The ID will download.
    11. Copy the downloaded ID it to kumodd/config/gdrive.json.
  3. Authorize kumodd to access the cloud account:

    The first time kumodd is used (e.g. kumodd.py -l all), it will open the login page in a web browser.

    1. Login to the cloud account. Next, it will request approval.
    2. Click "Approve". Next, kumodd stores the Oauth token in config/gdrive.dat.

    If there is no local browser, or if --nobrowser is used, kumodd will instead print a URL of the login page.

    1. Copy the URL and paste it into a browser.
    2. Login to the cloud account. Next, it will request approval.
    3. Click "Approve". Next, the page will show an access token.
    4. Copy the token from the web page. Paste it into kumodd, and press enter. Next, kumodd saves the Oauth token in config/gdrive.dat.

    Once authorized, the login page will not be shown again unless the token expires or config/gdrive.dat is deleted.

Usage

./kumodd.py [flags]

flags:
  -p,--destination: Destination file path
    (default: './download')
  -d,--get_items: <all|doc|xls|ppt|text|pdf|office|image|audio|video|other>: Download files and create directories, optionally filtered by category
  -l,--list_items: <all|doc|xls|ppt|text|pdf|office|image|audio|video|other>: List files and directories, optionally filtered by category
  --log: <DEBUG|INFO|WARNING|ERROR|CRITICAL>: Set the level of logging detail.
    (default: 'ERROR')
  -m,--metadata_destination: Destination file path for metadata information
    (default: './download/metadata')
  -csv,--usecsv: Download files from the service using a previously generated CSV file
    (a comma separated list)
  --[no]browser: open a web browser to authorize access to the google drive account
    (default: 'true')
  -c,--config: config file
    (default: 'config/config.yml')
  --gdrive_auth: Google Drive acccount authorization file.  Configured in config/config.yml if not specifed on command line.
  --[no]pdf: Convert all native Google Apps files to PDF.
    (default: 'false')
  --[no]revisions: Download every revision of each file.
    (default: 'true')

Try --helpfull to get a list of all flags.

The filter option limits output to a selected category of files. A file's category is determined its mime type.

Filter Description
all All files stored in the account
doc Documents: Google Docs, doc, docx, odt
xls Spreadsheets: Google Sheets, xls, xlsx, ods
ppt Presentations: Google Slides, ppt, pptx, odp
text Text/source code files
pdf PDF files
office Documents, spreadsheets and presentations
image Image files
audio Audio files
video Video files

To relay kumodd access though an HTTP proxy, specify the proxy in config/config.yml:

proxy:
  host: proxy.host.com
  port: 8888 (optional)
  user: username (optional)
  pass: password (optional)

Configuration

Command line arguments are used for configuration specific to a data set or case, while a YAML file is used for configuration items not specific to a data set or case. This is intended to support reproducibility. Multiple configuration files can be used to generate multiple arrangements of CSV columns.

If config/config.yml does not exist, kumodd will create it using:

gdrive:
  gdrive_auth: config/gdrive_config.json
  oauth_id: config/gdrive.dat
  csv_prefix: ./filelist-
  csv_columns: title,category,modTimeMatch,md5Match,revision,ownerNames,fileSize,modifiedDate,createdDate,mimeType,path,id,lastModifyingUserName,md5Checksum,md5Local,modifiedByMeDate,lastViewedByMeDate,shared

csv_title:
  app:			Application
  category:		Category
  createdDate:		Created (UTC)
  fileSize:		Bytes
  id:			File Id
  index:		Index
  lastModifyingUserName: Modfied by
  lastViewedByMeDate:	My Last View
  local_path:		Local Path
  md5Checksum:		MD5
  md5Local:		Local MD5
  md5Match:		MD5s
  mimeType:		MIME Type
  modTimeMatch:		Mod Time
  modifiedByMeDate:	My Last Mod
  modifiedDate:		Last Modified (UTC)
  ownerNames:		Owner
  path:			Remote Path
  revision:		Revisions
  shared:		Shared
  status:		Status
  time:			Time (UTC)
  title:		Name
  user:			User
  version:		Version
Config item Description
gdrive_auth filename of the google drive account authorization. Ignored if provided on command line.
oauth_id filename of the Oauth client ID credentials
csv_prefix the leading portion of the CSV file path. Username and .csv are appended.
csv_title list of column titles for each metadata name

Caveats

Google Drive permits duplicate file names within a folder, whereas Unix and Windows filesystems generally refuse it. Duplicates within a folder cause missing files and mismatching metadata. As it stands, Kumodd does not export directly to a logical forensic image format, which would resolve this.

Downloading native Google Apps docs, sheets and slides is much slower than non-native files, because format conversion to PDF of LibreOffice is required.

Validation is limited to available data. Native Google Apps and certain PDF files do not provide a MD5 digest. Last Modify time is the only reliable file system time stamp. To detect changes, kumod compares the MD5, file size and Last Modify time. If any of these differ from Google Drive's metadata, kumodd will download and update the file and YAML metadata.

Using an HTTP proxy on Windows does not work due to unresolved issues with python 3's httplib2.

Google rate limits API calls. At the time of writing, the default rate limits are:

  • 1,000,000,000 queries per day
  • 1,000 queries per 100 seconds per user
  • 10,000 queries per 100 seconds

Kumodd uses the Google API Python Client which is officially supported by Google, and is feature complete and stable. However, it is not actively developed. It has has been replaced by the Google Cloud client libraries which are in development, and recommended for new work.

Developer Notes

To get debug logs to stdout, set 'log_to_stdout: True' in config.yml.

Example raw metadata

Metadata provided by the Google Drive are described in the Google Drive API Documentation. A few of the available metadata are shown in the following YAML. This is the metadata of a PDF file.

alternateLink: https://drive.google.com/a/murphey.org/file/d/0s9b2T_442nb0MHBxdmZo3pwnaGRiY01LbmVhcEZEa1FvTWtJ/view?usp=drivesdk
appDataContents: false
capabilities: {canCopy: true, canEdit: true}
category: pdf
copyRequiresWriterPermission: false
copyable: true
createdDate: '2017-09-28T20:06:50.000Z'
downloadUrl: https://doc-0k-9o-docs.googleusercontent.com/docs/securesc/m7lwc9em35jjdnsnezv7rlslwb7hsf02/0b2slbx08rcsbwz9rilnq9rqup99h7nh/1562400000000/14466611316174614883/14466611316174614883/0s9b2T_442nb0MHBxdmZo3pwnaGRiY01LbmVhcEZEa1FvTWtJ?h=07676726225626533888&e=download&gd=true
editable: true
embedLink: https://drive.google.com/a/murphey.org/file/d/0s9b2T_442nb0MHBxdmZo3pwnaGRiY01LbmVhcEZEa1FvTWtJ/preview?usp=drivesdk
etag: '"_sblwcq0fTsl4917mBslb2bHWsg/MTUwNjYyOTM4OTA2Mg"'
explicitlyTrashed: false
fileExtension: pdf
fileSize: '2843534'
headRevisionId: 0B4pnT_44h5smaXVvSE9GMUtSMFJjSWVDeXQxTWhCeUFMUW9ZPQ
iconLink: https://drive-thirdparty.googleusercontent.com/16/type/application/pdf
id: 0s9b2T_442nb0MHBxdmZo3pwnaGRiY01LbmVhcEZEa1FvTWtJ
kind: drive#file
label_key: '     '
labels: {hidden: false, restricted: false, starred: false, trashed: false, viewed: false}
lastModifyingUser:
  displayName: John Doe
  emailAddress: john.doe@gmail.com
  isAuthenticatedUser: true
  kind: drive#user
  permissionId: '14466611316174614251'
  picture: {url: 'https://lh5.googleusercontent.com/-ptNwlcuNOi8/AAAAAAAAAAI/AAAAAAAAGkE/NRxpYvByBx0/s64/photo.jpg'}
lastModifyingUserName: John Doe
local_path: ./download/john.doe@gmail.com/./My Drive/TxDOT Accident Report (551632).pdf
markedViewedByMeDate: '1970-01-01T00:00:00.000Z'
md5Checksum: 5d5550259da199ca9d426ad90f87e60e
md5Local: 5d5550259da199ca9d426ad90f87e60e
md5Match: match
mimeType: application/pdf
modifiedByMeDate: '2017-09-28T20:09:49.062Z'
modifiedDate: '2017-09-28T20:09:49.062Z'
originalFilename: TxDOT Accident Report (551632).pdf
ownerNames: [John Doe]
owners:
- displayName: John Doe
  emailAddress: john.doe@gmail.com
  isAuthenticatedUser: true
  kind: drive#user
  permissionId: '14466611316174614251'
  picture: {url: 'https://lh5.googleusercontent.com/-ptNwlcuNOi8/AAAAAAAAAAI/AAAAAAAAGkE/NRxpYvByBx0/s64/photo.jpg'}
parents:
- {id: 0AIpnT_44h5smUk9PVA, isRoot: true, kind: drive#parentReference, parentLink: 'https://www.googleapis.com/drive/v2/files/0AIpnT_44h5smUk9PVA',
  selfLink: 'https://www.googleapis.com/drive/v2/files/0s9b2T_442nb0MHBxdmZo3pwnaGRiY01LbmVhcEZEa1FvTWtJ/parents/0AIpnT_44h5smUk9PVA'}
path: ./My Drive/TxDOT Accident Report (551632).pdf
quotaBytesUsed: '2843534'
revision: '1'
selfLink: https://www.googleapis.com/drive/v2/files/0s9b2T_442nb0MHBxdmZo3pwnaGRiY01LbmVhcEZEa1FvTWtJ
shared: false
spaces: [drive]
status: update
title: TxDOT Accident Report (551632).pdf
userPermission: {etag: '"_sblwcq0fTsl4917mBslb2bHWsg/TpnHf_kgQXZabQ7VDW-96dK3owM"',
  id: me, kind: drive#permission, role: owner, selfLink: 'https://www.googleapis.com/drive/v2/files/0s9b2T_442nb0MHBxdmZo3pwnaGRiY01LbmVhcEZEa1FvTWtJ/permissions/me',
  type: user}
version: '5'
webContentLink: https://drive.google.com/a/murphey.org/uc?id=0s9b2T_442nb0MHBxdmZo3pwnaGRiY01LbmVhcEZEa1FvTWtJ&export=download
writersCanShare: true

About

Tool for API-based acquisition of evidence from cloud storage providers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages