Welcome to the ScanningData GitHub Repository! This Repository contains sets of scripts essential for data scanning in support of the seeWaybeyond Platform, although it could be use completely stand-alone. Early days for the Repository.
Structured data usually resides in relational databases (RDBMS). Fields store length-delineated data phone numbers, Social Security numbers, or ZIP codes. Even text strings of variable length like names are contained in records, making it a simple matter to search.
Unstructured data has internal structure but is not structured via pre-defined data models or schema. It may be textual or non-textual, and human- or machine-generated. It may also be stored within a non-relational database like NoSQL. Typically includes:
- Text files: Word processing, spreadsheets, presentations, email, logs.
- Email: Email has some internal structure thanks to its metadata, and we sometimes refer to it as semi-structured. However, its message field is unstructured and traditional analytics tools cannot parse it.
- Social Media: Data from Facebook, Twitter, LinkedIn.
- Website: YouTube, Instagram, photo sharing sites.
- Mobile data: Text messages, locations.
- Communications: Chat, IM, phone recordings, collaboration software.
- Media: MP3, digital photos, audio and video files.
- Business applications: MS Office documents, productivity applications.
- Typical machine-generated unstructured data includes:
- Satellite imagery: Weather data, land forms, military movements.
- Scientific data: Oil and gas exploration, space exploration, seismic imagery, atmospheric data.
- Digital surveillance: Surveillance photos and video.
- Sensor data: Traffic, weather, oceanographic sensors.
Semi-structured data maintains internal tags and markings that identify separate data elements, which enables information grouping and hierarchies. Amongst others includes:
- Markup language XML This is a semi-structured document language.
- Open standard JSON (JavaScript Object Notation) JSON is another semi-structured data interchange format.
Loosely grouped for now.
- 3RD Party Data
- Cloud Data
- Big Data