Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start aggregating RSS feeds for eventual fine-tuning #18

Open
bigolboyyo opened this issue May 24, 2023 · 1 comment
Open

Start aggregating RSS feeds for eventual fine-tuning #18

bigolboyyo opened this issue May 24, 2023 · 1 comment
Assignees
Labels
good first issue Good for newcomers

Comments

@bigolboyyo
Copy link
Contributor

As title states the plan is to aggregate a list of rss feeds into the 13 sectors associated with our solutions.

Here's a list of feeds I've started gathering:

https://grist.org/feed/
https://arstechnica.com/tag/climate-change/feed/
https://www.desmog.com/feed/
https://techcrunch.com/tag/climate-tech/feed/
https://www.carbonbrief.org/feed
https://www.greenbuildingadvisor.com/feed
https://www.sciencedaily.com/rss/earth_climate.xml

Additionally we can use these google feeds to watch each company in our growing list of resources!

"and don’t forget there’s an RSS url pattern for every single Google News search" - @futuresoup 

For the organization “One Tree Planted”
https://news.google.com/search?q=%22One%20Tree%20Planted%22&hl=en-US&gl=US&ceid=US%3Aen
https://news.google.com/rss/search?q=%22One+Tree+Planted%22&hl=en-US&gl=US&ceid=US:en

Current Progress

I am using Vercel serverless functions for now and grabbing data from a climateTechFeeds array. Once @shreeup get's our flask app finalized I'll move towards transposing it to an api endpoint in our flask app if/when needed.

If Vercel ends up working standalone for this that's fine too.

mvp code

import { VercelRequest, VercelResponse } from '@vercel/node';
import fetch from 'node-fetch';
import { parseStringPromise } from 'xml2js';
import { climateTechFeeds } from '../data/climateTech';

export default async function (req: VercelRequest, res: VercelResponse) {
  const allowedOrigins = ['http://127.0.0.1:8000', 'https://www.climatetechhandbook.com'];
  const origin = req.headers.origin || "";
  if (allowedOrigins.includes(origin)) {
    res.setHeader('Access-Control-Allow-Origin', origin);
  }
  res.setHeader('Access-Control-Allow-Methods', 'GET');

  // Fetch and parse RSS feeds
  const parsedFeeds = await Promise.all(
    climateTechFeeds.map(async (feed) => {
      const response = await fetch(feed.urls[0]);
      const xml = await response.text();
      const json = await parseStringPromise(xml);
      return { ...feed, items: json };
    })
  );

  res.status(200).json(parsedFeeds);
}

My Current Ongoing Development

  • Gather as many "climate tech" related feeds as possible.
  • Gather them all into one unified bridge
  • Parse through the massive bridge and separate into 13 sectors

How to help

  • Gather more feeds into a master list
  • Start designing UI/UX for the end feed (integrate to mkdocs? separate standalone html doc? separate app framework all together?)
  • Start planning integration with @Yinan0409's fine tuning progress (how are we going to train based off of these feeds)
  • start gathering a list of Google rss feeds for each associated job we have in our google drive documents (a quick script could probably handle this)
  • Make suggestions, more specific issues, feature requests, and get involved. Even comments and discussion is highly encouraged
@bigolboyyo
Copy link
Contributor Author

I'll have more a more detailed update shortly and a better readme for this repo but if you want to get a head start feel free to check it out in the meantime.

current deployment: https://rss-functions.vercel.app/api/climateTechRSS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
Status: Considering
Development

No branches or pull requests

3 participants