Skip to content

JoseMezaVila/datajournalism-resources

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 

Repository files navigation

datajournalism-resources

A compilation of links to datajournalism & OSINT tools, guides and resources I find useful to keep at hand. PRs welcomed!

by r3mlab | License: CC-BY-NC 4.0

Legend:

  • 🌐 = online tool/service/database
  • πŸ’» = software
  • πŸ“– = guide/tutorial
  • πŸ“ = list of tools/resources
  • 🐍 = Python module
  • πŸ’² = paid or paid-only tool/service

Contents

APIs

  • Postman πŸ’» - API development environment offering useful tools for crafting and debugging API requests.
  • ProgrammableWeb πŸ“ - A good API directory.
  • Public APIs πŸ“ - A categorized list of APIs.

Archival

Breached Data

  • Breach Data Search Engines Comparison πŸ“ (IntelTechniques)
  • CardPwn πŸ’» - Find out if a credit card number appears in a breach.
  • Dehashed πŸŒπŸ’² - Find cleartext & hashed password from data breaches (paid, $4/week, $11/mo).
  • GhostProject 🌐 - Check if an email appears in a breach. Shows the first 3 characters of the password for free.
  • h8mail πŸ’» - Find passwords through different breach and reconnaissance services. Can also search the BreachedCompilation torrent.
  • Have I Been Pwned? 🌐 - Check if an email appears in a breach, set up alerts.
  • pwndb.py πŸ’» - Command-line tool for searching leaked credentials using the Onion service with the same name.
  • WhatBreach πŸ’» - Search for breached emails and their corresponding database.

Companies

  • CompaniesHouse Short Guide πŸ“– (Bellingcat) - A guide about the UK online company registry.
  • DocumentCloud Search 🌐 - Search public documents uploaded to DocumentCloud, a publishing plateform used by many journalists and media.
  • ICIJ's Offshore Leaks Database 🌐 - Data on offshore companies, foundations and trusts from the Panama Papers, the Offshore Leaks, the Bahamas Leaks and the Paradise Papers investigations.
  • List of company registers πŸ“ (Wikipedia) - A list of all companies registers, by country.
  • OCCRP Data 🌐 - Fantastic search tool & resources made available by OCCRP. Public records, leaks, scraped business registers, and more.
  • OCCRP Investigative Dashboard πŸ“ - Collection of the most useful public data sources for investigative reporting. Many business registries listed.
  • OpenCorporates 🌐 - A very comprehensive companies database. Has an API.
  • Open Ownership Register 🌐 - Explore beneficial ownership data. Aggregates many datasets.

Data Analysis & Manipulation

See also: Visualization

  • csvkit πŸ’» - A suite of command-line tools for converting to and working with CSV files.
  • OpenRefine πŸ’» - Clean & transform messy data.
  • pandas 🐍 - Powerful Python data analysis library. Best used in a Jupyter notebook.

Email

See also: Breached Data

  • emailrep.io 🌐 - Public email reputation search & API. Can find social media profiles.
  • Infoga πŸ’» - Gather email accounts information (ip, hostname, country, etc) from different public sources.
  • theHarvester πŸ’» - Python command-line tool to search several search engines for mail addresses from a particular domain.
  • The most complete guide to finding anyone's email πŸ“– (Blurbiz)
  • Trumail 🌐 - Free email verification API.

Lists of tools & resources

Location, Maps, Satellite Imagery

Interpretation

Mapping services & software

Tools & techniques

User generated content

See also: Social Networks

  • EchoSec πŸŒπŸ’² - Search and analyze social media data based on location. ($499/mo)
  • GeoCreepy πŸ’» - Geolocation information gathering through social networking platforms (discontinued).
  • Kamerka πŸ’»- Create an interactive map of cameras, printers, tweets and photos based on your coordinates.
  • OpenStreetMap 🌐 - User generated locations & maps. Use taginfo and/or overpass-turbo.eu to search a location by key/value tags (see OSM's Wiki)
  • Mapillary 🌐 - Interactive map of crowdsourced geotagged photographs.
  • OpenStreetCam 🌐 - Map of crowdsourced street-level photographs.
  • Social networks (see category)
  • Surveillance under Surveillance 🌐 - User-contributed map of cameras and guards.
  • Tourism & review websites: Foursquare, TripAdvisor, Yelp, etc. 🌐
  • Vkontakte 🌐 - Use near:<coordinates> in a search.
  • Wikimapia 🌐 - User-generated locations & descriptions. Has an API.

Military/Weapons

Multi-purpose tools

  • Buscador πŸ’» - A very handy VM with plenty of pre-installed & pre-configured OSINT tools.
  • DataSploit πŸ’» - A collection of python scripts which automates open source intelligence searches about domain names, email addresses, IP addresses and usernames.
  • IntelligenceX Tools 🌐 - Various search, email and domain tools.
  • Maltego CE πŸ’» - Interactive data mining & mapping tool.
  • Spiderfoot πŸ’» - Open source intelligence automation tool. Gathers intelligence about a given target, which may be an IP address, domain name, hostname, network subnet, ASN, e-mail address or person's name.

News

  • AllYouCanRead πŸ“ - Database of news outlets by country.
  • NewsLookup 🌐 - News search engine with useful filters.
  • NewsNow 🌐 - News search engine with useful filters.
  • NewspaperMap 🌐 - Newspapers world map with feeds and automatic translation.

Phone numbers

Pictures, Photos, Videos

Pictures Metadata

Reverse search

  • Bing Images 🌐 - Can search part of an image by resizing on the fly.
  • CitizenEvidence 🌐 - Google Images reverse search on Youtube thumbnails.
  • EagleEye πŸ’» - Find Instagram, FB and Twitter profiles using image recognition and reverse image search.
  • Google Images 🌐
  • Search by Image πŸ’» - Browser extension to quickly reverse-search an image on 20+ search engines.
  • TinEye 🌐
  • Yandex Images 🌐

Search

  • How to Conduct Comprehensive Video Collection (Bellingcat) πŸ“–
  • PimEyes 🌐 - Face-recognition matching search engine.
  • SearchFace.ru 🌐 - Face recognition search engine for the Russian VK social network. See this guide from Bellingcat for a tutorial.
  • SocialMapper 🌐 - Social Media Mapping Tool that correlates profiles via facial recognition. Supports LinkedIn, Facebook, Twitter, Instagram, VKontakte, Weibo, Douban.

Verification & Analysis

Social Networks

All/General

  • EagleEye πŸ’» - Find Instagram, FB and Twitter profiles using image recognition and reverse image search.
  • HashAtIt 🌐 - Hashtag search across Twitter, Instagram, Pinterest, Facebook and Youtube.
  • Sherlock πŸ’» - Search for a username across 135 social media sites.
  • SocialMapper 🌐 - Social Media Mapping Tool that correlates profiles via facial recognition. Supports LinkedIn, Facebook, Twitter, Instagram, VKontakte, Weibo, Douban.
  • WhatsMyName πŸ’» - Search for usernames on 180+ web sites.

Discord

  • dis.cool 🌐 - Discord search engine.

Facebook

  • fb-search 🌐 - Simple Graph query crafter. Made after Facebook sudden closure of Graph Search.
  • FFFF Finds Facebook Friends πŸ’» - Builds a relationship graph of a target user. Partially reconstructs hidden friend lists. πŸ”₯.

Github

  • gitrob πŸ’» - Find potentially sensitive files pushed to public repositories on Github. Requires a GitHub access token.
  • Zen πŸ’» - Find emails of Github users.

Instagram

  • instaloader πŸ’» - Download pictures (or videos) along with their captions and other metadata from Instagram.
  • instagram-scraper πŸ’» - Scrape a user's photos and videos.
  • searchmybio 🌐 - Search Instagram users biographies.

Linkedin

Reddit

Snapchat

  • Snapdex 🌐 - Searchable database of Snapchat usernames.
  • Snap Map 🌐 - Official Snapchat map.

Telegram

  • Buzz.im 🌐 - Search in open telegram messages.
  • Lyzem 🌐 - Telegram search engine.
  • Telegago 🌐 - Google Custom Search Engine for Telegram users & content. Can discover private groups.
  • tlgrm.eu 🌐 - Search for Telegram channels.
  • tgstat.ru 🌐 - Telegram analytics & seach tool.

Twitter

  • DMI-TCAT πŸ’» - PHP web interface to retrieve and analyze tweets.
  • SocialBearing 🌐 - Statistics on keywords, hashtags, users.
  • SpoonBill 🌐 - Track changes in Twitter profiles & bios. Requires a Twitter account.
  • tinfoleak πŸ’» - Very complete open-source tool for Twitter intelligence analysis. Needs API credentials.
  • twarc πŸ’»πŸ - A command line tool and Python library for archiving Twitter in JSON format.
  • Tweetdeck 🌐
  • Tweetdeck Location Search Tutorial πŸ“–
  • Tweet Map 🌐 - Explore the world and find geo-tagged tweets.
  • Tweets Analyzer πŸ’» - Twitter profile analyzer with tweet activity charts, locations, most used hashtags, etc. Can save tweets to JSON. Requires a Twitter API key.
  • tweetsmapper πŸ’» - Generates a Leaflet map for a given user or from an existing collection of tweets. Can retrieve full timelines.
  • TWINT (Twitter Intelligence Tool) πŸ’» - Advanced Twitter scraping tool, no API key needed. Can export to text, CSV, JSON, SQLite, Elasticsearch. Can detect emails, phone numbers, profiles.
  • Who Tweeted It First? 🌐 - Find out who was the first person who tweeted a link, video, quote or any piece of text.

VKontakte

  • SnRadar 🌐 - Search VKontakte content by location.

Youtube

  • Unlisted Videos 🌐 - Search & submit unlisted YouTube videos. No registration required.

Text & Documents

Documents metadata

  • Apache Tika πŸ’» - Extract metadata and text from over a thousand different file types.
  • FOCA πŸŒπŸ’» - Find metadata and hidden information in Microsoft Office, Open Office, or PDF files.
  • ICIJ Extract πŸ’» - A command line tool for parallelized, distributed content-extraction.

Indexing & searching

  • Aleph πŸ’» - A toolkit for data search, management and analysis in investigative reporting.
  • Blacklight πŸ’» - Open source Solr user interface discovery platform.
  • Datashare πŸ’» - Index & search documents on your computer, automatically detect people, organizations and locations with NLP.
  • DumpsterDiver πŸ’» - Analyze big volumes of various file types in search of secrets, credentials, etc.
  • ICIJ Extract πŸ’» - A command line tool for parallelized, distributed content-extraction.
  • searchbox πŸ’» - A simple out-of-the-box web interface to search through thousands of unstructured documents using Solr.

OCR

  • NewOCR.com 🌐 - Recognizes several languages. Can resize images & has shortcuts to Google & Bing Translate.
  • Tesseract πŸ’» - Open-source OCR engine.

PDF

  • PDF Text Extraction with PyPDF2, Tika & PDF Miner. πŸ’»
  • tabula πŸ’» - Tool for liberating data tables trapped inside PDF files.

Text Processing & Analysis

  • topia 🐍 - Python module to determine important terms within a given piece of content.
  • TXM πŸ’» - Lexicometry and text statistical analysis for large bodies of text.

Transportation

Containers & Shipments

  • BIC Code Register 🌐 - Business Identifier Codes lookup. The website also has other search tools and useful information on container markings.
  • Prefix List 🌐 - Find the owner of a container from its prefix.
  • track-trace 🌐 - Track parcels/shipments, air cargo, containers and post.

Planes

Ships

Visualization

Graphs

  • Data Visualisation Catalogue πŸ“– - Find which visualisation is right for what you want to show. Plenty of tips & resources.
  • DataWrapper πŸŒπŸ’² - Easy to use graph & map tool. Free plan available.
  • Google Fusion Tables - Create maps & charts from data. Will shut down on Dec. 2019.
  • Matplotlib 🐍 - Python 2D plotting library. Best used with pandas in a Jupyter notebook.
  • RawGraph πŸŒπŸ’» - Generate static graphs through a very user-friendly interface. Can be run locally.

Maps

  • ArcGIS πŸ’»πŸ’² - Mapping & analysis software (proprietary, paid, 21-day trial)
  • Folium 🐍 - Python library to create Leaflet.js maps. Can be used in a Jupyter Notebook to map data from pandas.
  • Geopy 🐍 - Python geocoding library. Supports OSM Nominatim, Google, Bing, GeoNames & many more.
  • Google:
  • Humanitarian Data Exchange 🌐 - Useful resources of shapefiles, especially for administrative boundaries.
  • KML Interactive Sampler 🌐 - Lots of KML templates.
  • QGIS πŸ’» - Free & open-source alternative to ArcGis.

Mindmaps & Network graphs

Timelines

  • Tik Tok πŸ’» - Javascript tool to easily create simple, mobile-friendly, vertical timelines. Open-source.
  • TimelineJS πŸ’»

Weather

Websites

See also: Archival

Dark Web & Onion services

Scraping

Searches, info, related entities

Misc

  • awesome-selfhosted πŸ“ - A list of Free Software network services and web applications which can be hosted locally
  • grayhatwarfare 🌐 - Search open Amazon S3 buckets content.
  • Shodan 🌐 - Internet of Things search engine
  • World License Plates 🌐 - Pictures of license plates from all around the world.

License

This list is under the Creative Commons Attribution-NonCommercial 4.0 International Public License License.

About

A compilation of links to datajournalism & OSINT tools, guides and resources I find useful to keep at hand.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published