Skip to content
This repository has been archived by the owner on May 24, 2024. It is now read-only.
/ wiki Public archive

contains architecture/governance documents

License

Notifications You must be signed in to change notification settings

GranyteTech/wiki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wiki

contains architecture/governance documents on how this project will be built.

Content

├── Objective
├── Architecture
│   ├── Platform
│   ├── Structure
│   └── Tech Stack
└── Data sources
    ├── Financial sources 
    └── Social Sources

Objective of GranyteTech

data analytics company which collects:

  • financial data
  • Nominal data

which will generate alpha based on the sentiment from Nominal data to understand the market's growth while minimising cost on GCP.

The inference will try to predict short term(roughly 3-6 months ahead) investments on how well the security will do.

Costs

  • Nominal data will be quantified by it's sentiment and aggregated to reduce training cost.
  • All Data will be accessed from GCS buckets (collecting data from Big Query is too costly).
  • All GCS bucket data will be partitioned into months (to reduce querying cost: Operation A costs).
  • All APIs will use pagination to limit usage.
  • Data collection will be batched by weekly or monthly ingestions. (cheaper than day/streaming).

Failsafes & Products

If the project is unsuccessful, it can still be slavaged with 2 failsafes, which will deliver well structured, organised data to clients.

Failsafe 1: Strucuted finanical data

Low risk, low/none reward

Fast to implement, however multiple companies provide services such as this.

Failsafe 2: Strucuted nominal data

extremely high risk, med reward

A few services will collect data from multiple sources in a structured manner under one topic.

Architecture

Platform

Google cloud platform (GCP) will be the main platform used to develop this service.

Diagram

UML diagram of general architecture

Tech Stack

DevOps

Infrastructure handler (any one):

  • Terraform
  • Pulumi
  • Deployment manager (not recommended)

Python3 (Basic, able to complete easy leetcode questions):

  • pytest
  • unittest
  • pulumi

General:

  • Docker/Containerd
  • GCP: Cloud Build
  • GCP: Secret Manager
  • GCP: IAM, roles, policies & Service accounts
  • Good understanding of Scalable architecture

DE, DS, MLE

Python3 (Basic, able to complete easy leetcode questions):

  • jupyter notebook
  • numpy
  • pandas
  • pytorch/keras (not chosen as of yet)

General:

  • Docker/Containerd
  • GCP: Cloud Build
  • GCP: Cloud Run/GCP: Cloud Function
  • GCP: Big Query
  • SQL (Basic, able to complete easy leecode questions)
  • Understand how transformers work (but do not need to know how to program them)
  • Know how to setup GPU acceleration

SWE, BED (in late stages)

JavaScript (Basic, able to complete easy leetcode questions):

  • typescript
  • node js
  • express

Python3 (Basic, able to complete easy leetcode questions):

  • Django
  • Flask

General:

  • Docker/Containerd
  • GCP: Cloud Build
  • GCP: App Engine
  • Good understanding how to build

Data sources

This will be focused around using free api sources

Financial Data

Majority would consist of multiple sources of macro data + one source of mirco data

Yahoo Finance

WTO: World Trade Organization

WB: World Bank

IMF: International Monetary Fund

BLS: US BUREAU OF LABOUR STATISTICS (Deprecated)

FRED: Federal Reserve Economic Data

BEA: Bureau of Economic Analysis

Sources to look at

World Bank, UN, Eurostat, ADB, BEA, BLS, FRED

Social media

YouTube

Telegram

Twitter

Reddit

About

contains architecture/governance documents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published