## Overview
News feed is a list of user generated (people connected to user) content like text, image, audio, likes, status update, etc tailored for a user to consume. Exmples include Facebook news feed, Instagram feed, Twitter timeline and more.

The system needs to provide 2 main features:
- ability to publish a post
- ability to view connection's post

How to order the post? There can be multiple ways:
- chronological order
- order by post score. Score can be generated by combining multiple factors like date time, post type, etc

A post can contain only text or text + image/video.

### Calculations
If the daily active user ~ 10 million. Assuming each user requests feed on an average 10 times, this is equals  
$\frac{10000000}{3600 \times 24} \times 10 = 1150 \ QPS$  

**Storage Requirements:**
- **User Data:** If we assume 50% of total users are DAU, that would mean there are 20 million total users. If each user's data takes 50KB then in total it takes $\frac{20000000 \times 50}{1024 \times 1024} = 953 \ GB$
- **Text Post:** top 100 posts of a user assuming 90% of all posts are text and 5KB per post would take $\frac{10000000 \times 5 \times 90}{1024 \times 1024 \times 1024} = 4.2 \ TB$
- **Image Post:** assume images taking 100KB on average

## API Design
To generate a user's new feed:
- GET /v1/me/feed?count=100 provides 100 posts in the feed

To publish a post
- POST /v1/me/feed with body indicating the content type (text/image/etc) and content.

## Database Schema
User and post data are structured and can be saved as database tables. The connections (friends, followers) can be represented in terms of database table, but a graph database is more suited for the purpose.

<img src="images/news_feed_schema.png" />

## Architecture
### Feed publishing

<img src="images/feed_publishing.png" />

**Post service:** persists post in the database and cache.  
**Notification service:** inform connections that new content is available and send out push notifications.  
**Fanout Service:** is responsible for publishing posts to all connections. There are two possible models - fanout on write (push) and fanout on read (pull).  
- **Fanout on Write:** heavy lifting happens immediately when the content is written (published). The content is "pushed" into the feed (inbox) of every single connection instantly.  
      <img src="images/fanout_on_write.png" />  
      While this model ensures very fast retrieval of feeds, it has some issues. It scales linearly with the number of connections (N). If a user has 10 million connections, 10 million writes occur per post. Moreover write happens for inactive users who rarely login. This wastes space and compute power.

- **Fanout on Read:** the feed is not pre-computed. The heavy lifting happens only when the user explicitly fetches their news feed. The content is "pulled" by the client on request.  
      <img src="images/fanout_on_read.png" />  
      While this model solves the problems of *fan on write* model, the problem now is slow reads.


- **Hybrid Model:** for users with low number of connections, use *fan on write*. For users with high connections count, use *fan on read*.

### News Feed Retrieval

<img src="images/feed_retrieval.png" />

News Feed service combines data from post database, user database and news feed cache to prepare news feed for a user. When reading, often a hybrid approach is taken where in addition to read from the news feed cache, we would also have to get the posts individually for connections having high connection count. Example:

<img src="images/feed_retrieval_more.png" />