Skip to content

ParriauxMaxime/skyquest

Repository files navigation

Skyquest

A Skyblog oddity

Powaa

In a non-indexed website, no one can hear you crawl


Description

Skyrock.com is a social networking site based in France that offers a free space on the web to allow its users to create blogs, add profiles, and exchange messages with other registered members

Between 2006 and 2009, almost every french guy/girl within the teens range used to have a blog online.
Some are okay, some are creepy, but most of them are definitely cringe.

Can you feel it ? The dense and awful perfume of cryptic messages between teenager which is not without remembering you the early Facebook odyssey.


Goal

My significant other had one of them. She challenged me to find it.

Powaa


Approach

After a few month of unfructous manual labor, I came to multiple conclusions and research

  1. Google refused to indexes Skyrock.com (actually, some of it is indexed, but less than X%)

  2. Most skyrock userbase was young (around 16 yo), Internet was deep and unknown at that time, so most content would be gibberish.

  3. Username are composed of a maximum of 24 char which will be used in the URL, which restrict the charset to only 37 chars [abcdefghijklmnopqrstuvwxyz0123456789-] (lowercase only since it's would be digest in the URL).
    EG: if my username is ParriauxMaxime, my skyblog should exist at https://parriauxmaxime.skyrock.com.

  4. Being a social network before Facebook mass adoption in France (~ 2009, take it or leave it), Skyrock allowed profiles to connect between each others : Fan/Source
    Fan and source usually being 1:1, let's focus on Fans only.
    Those are located at https://parriauxmaxime.skyrock.com/fans.html, https://parriauxmaxime.skyrock.com/fans2.html, ...

  5. Most of the userbase (at least 80%) had filled some essentials informations: age, localization, postalCode, country, etc.

Basically, bruteforcing every nickname possibility and scrapping data on-the-go would take around 4.37 * 10³⁴ seconds, assuming 1000 fetch/second.
Going with the precedent research, doing the same with a 68% confidence in the nickname length (6 - 15) would still take an eternity (~10¹¹ seconds)
(Been there, done that, useless)


Remember Six degrees of separation ?
In a nutshell, you're connected to anybody on this planet in less than 6 hops

Sanity approach would be to map through fans with a "close" localization to your target


Get started

  1. Install the dependencies
  yarn
  # or
  npm i
  1. You gonna need a database (postgres with postgis activated, and i also used metabase to have a quick glance to the awful amount of data there)
  docker-compose up -d
  1. Populate your database with what I got from 48hours of crawling (you need psql installed locally)
  yarn psql_init
  # or
  npm run psql_init
  1. (optional), If you plan to continue scrapping this, you can uncache some predata
  yarn uncache
  # or
  npm run uncache
  1. Start the application
  yarn start:dev
  1. REPL time. The profile command will fetch through 2 levels of recursion within the fans of "nickname"
  profile("nickname")

FAQ

  • Q: Did you succeed ?
    A: Yes

  • Q: What did it cost ?
    A: Everything

  • Q: My target did not filled his infos at that time, what should I do ?
    A: Some of his/her friends/fans had those infos filled, you should try to iterate on "close" (geographic era/age) fans

  • Q: What it I cannot find my target by age or localization ?
    A: You could try to fetch last 5 posts per blogs along with the first ~100 comments. From there, you will need to implement feature detections (surname/city name/school name/etc). If you end up navigating huge pile of crap data, you can try to use Zipf law to trim the garbage.





Build with Nest

Nest Logo

A progressive Node.js framework for building efficient and scalable server-side applications.

NPM Version Package License NPM Downloads CircleCI Coverage Discord Backers on Open Collective Sponsors on Open Collective Support us

Description

Nest framework TypeScript starter repository.

Installation

$ npm install

Running the app

# development
$ npm run start

# watch mode
$ npm run start:dev

# production mode
$ npm run start:prod

Test

# unit tests
$ npm run test

# e2e tests
$ npm run test:e2e

# test coverage
$ npm run test:cov

Support

Nest is an MIT-licensed open source project. It can grow thanks to the sponsors and support by the amazing backers. If you'd like to join them, please read more here.

Stay in touch

License

Nest is MIT licensed.