diff --git a/projects/Scraping Medium Articles/README.md b/projects/Scraping Medium Articles/README.md new file mode 100644 index 000000000..a18fa9fcc --- /dev/null +++ b/projects/Scraping Medium Articles/README.md @@ -0,0 +1,14 @@ +# Scraping Medium Articles +Well [Medium](https://medium.com/) is a website containing great articles and used by many programmers. +
This script asks the user for the url of a medium article, scrapes it's text and saves it to a text file into a folder named scraped_articles in the same directory. +
There are 3 text files in the folder scraped_articles as an example of how the article is scraped. + +### Prerequisites +`pip` install the modules given in requirements.txt +
Have a working network connection on the device + +### How to run the script +Run it like any other python file + +## *Author Name* +[Naman Shah](https://github.com/namanshah01) diff --git a/projects/Scraping Medium Articles/requirements.txt b/projects/Scraping Medium Articles/requirements.txt new file mode 100644 index 000000000..7b19ef560 --- /dev/null +++ b/projects/Scraping Medium Articles/requirements.txt @@ -0,0 +1,2 @@ +beautifulsoup4==4.9.1 +requests==2.23.0 diff --git a/projects/Scraping Medium Articles/scraped_articles/One_month_into_the_MLH_Fellowship.txt b/projects/Scraping Medium Articles/scraped_articles/One_month_into_the_MLH_Fellowship.txt new file mode 100644 index 000000000..a20d7aa50 --- /dev/null +++ b/projects/Scraping Medium Articles/scraped_articles/One_month_into_the_MLH_Fellowship.txt @@ -0,0 +1,58 @@ +url: https://medium.com/code-for-cause/one-month-into-the-mlh-fellowship-448249f61590 + +Title: ONE MONTH INTO THE MLH FELLOWSHIP +by Kunal Kushwaha + +INTRODUCTION + +One month into the MLH FellowshipKunal KushwahaFollowJul 5 · 8 min read + +“In real open source, you have the right to control your own destiny.” +— Linus Torvalds +What is the MLH Fellowship? +The MLH Fellowship is an internship alternative for software engineers, with a focus on Open Source projects. Instead of working on a project for just one company, students contribute to Open Source projects that are used by companies around the world. At the beginning of the program, fellows are placed into small groups called “pods” that collectively contribute to the assigned projects as a team under the educational mentorship of a professional software engineer. +Open source is a great way to get real-world software development experience from the comfort of your home. The open source community is very helpful and encourages new developers to take part in their organizations. One gains exposure, can test their skills, gain knowledge and bond with the community in order to produce quality code that helps people around the world. +The Process +I found out about the program via the MLH mailing list. Being an Open Source enthusiast, I was impressed by the structure of the program. Having attended past MLH events, I knew I had to sign up for this. The initial phase was the shortlisting of applications followed by a technical interview. Apart from work, the fellowship program also provides opportunities to build a network and have fun while doing so! +Result of my application +Students get to work on the latest Open Source technologies and are matched with projects according to their skills and interest, providing students with a learning opportunity while contributing to real-world projects. But, it’s not just about coding. Soft-skills and team-building exercises are conducted by MLH regularly, in addition to technical hands-on workshops! It’s a remote opprtunity but provides a global platform for students to showcase their skills. +Students are also provided with a monthly stipend to help cover the basic living expenses during their participation in the program. +Source: https://github.blog/2020-06-24-welcome-to-the-inaugural-class-of-mlh-fellows/ + +WEEK 1 +Alright, so the first week. This week was spent getting acquainted with the Fellowship system as well as getting to know the team members. I got introduced to some amazing community members during this time. Being an open-source enthusiast, I believe that diversity in the workplace and participation from people hailing from different cultures is necessary as well as instrumental for the growth of the IT sector. It exposes one to the multitude of values and principles that people from varying ethnicities hold. Meeting people from around the world teaches people to respect opposing perspectives and opinions, and ingrains in them respect for their peers. +We followed an exercise in which each fellow had to have a 1-on-1 get to know meeting with each of their Pod members which I believe this was a great way to get to know each other. We also got introduced to our mentor Jani, who has been a great motivation throughout the program and is helping each and every one of us achieve more, both in terms of technical as well as soft skills. During one of our first stand-ups, we decided on the name of our Pod together as a Team. I remember Jessie (my Podmate) suggested Reactive Sharks and I suggested Hackathon Sea-Son (as the theme was marine), and that’s how we ended up with Reactive Sea-Son (the best pod). +Reactive Sea-Son Logo +The first week ended with an Orientation Hackathon where we were divided into groups of 3–4. I got to see so many amazing projects presented by my fellow fellows. Our team Quarantime (pun intended) built a social media platform using MERNG stack, for students to use during the quarantine. + +These are the projects that I really thought went out of the box! +MLH-Fellowship/0.4.2-cssifyTired of using Bootstrap/Bulma, but don't want to scaffold a whole bunch of CSS on your own? CSSify to the rescue …github.com +MLH-Fellowship/0.4.1-Execute.ly-serverServer: Edit and execute handwritten or any code in an image right in your browser. …github.com + +WEEK 2 +This week started with the announcement of hackathon results and team Execute.ly from our Pod bagged the first prize! We all were really proud of our team, also because everyone in the winning team’s Pod was going to get prizes xD. I also spent some time this week to design our Pod’s logo. +We were excited for week 2 as this was the week during which our mentor was going to assign us projects. I found out that I will be contributing to Jest this summer. Jest is a JavaScript testing framework maintained by Facebook. It felt amazing that the code that I am going to write is going to be used by people around the world. Plus getting involved in the community of experienced developers is itself a huge learning opportunity. +After having a much project kickoff call with the Jest maintainers, the rest of the week was spent into learning more about the projects. I believe that writing blogs is a great way to show what you have learned to the community and help other newcomers in the projects as well. Keeping that in mind I wrote a blog on the architecture of Jest, provided below. +Jest ArchitectureWhy is Testing important?medium.com + +WEEK 3 +Week 3 started on a Monday with our daily standup. This was the week of coding and exploring more about the projects assigned to us. We also got introduced to weekly retrospectives and show and tells. Weekly retrospectives are a way to communicate with your team and let them know about your progress, shoutouts, and any blockers they might be facing. It’s divided into sub-points like: +Shoutouts (Optional Thank You’s / Recognition) — If anyone went above and beyond, let them know!Red (Stop / need help) — List out areas that have been challenging. This could include projects, tasks, workload, or challenges with Podmates. What didn’t work well this week? What can be done differently next week?Yellow (Use caution) — Provide context on areas of improvement. This could include projects, tasks, workload, or challenges with Podmates. What can be improved upon for next week? What resources and tools could you use to reach success?Green (All Good!) — Highlight What some of your successes were. What has gone well this week? Give examples of your weekly wins! This could include projects, tasks, or successes in teamwork. +Pod Retrospective +This is the week we started conducting show and tells. I had never been a part of such activity before where a person publicly presents what they have learned to a group of people and then they all have discussions over it. It seemed like a great learning opportunity for everyone and I highly recommend it. I volunteered for our first show and tell to give a demo on Docker, Kubernetes, and Red Hat’s Java K8s client. And I must say, it went amazing! Everyone, including me, learned a lot. I started with an introduction to the topics following a hands-on demo. Whatever discussion we have as a team, one of the best parts is the guidance and perspective we receive from our mentor, Jani, on the topics of discussion to relate it to the real-world. Shout to everyone on our team for being an amazing audience and for their active participation ☀️ +Kubernetes Made EasyWhat is Kubernetes?medium.com +This was also the week when we got some PRs flowing to Jest. Shout out to Saurav, who is an amazing teammate and it has been an amazing experience contributing to Jest with him. I also got to attend various workshops this week conducted by MLH. My favourite one this week was an Introduction to Network Security by Kyle 👨‍💻 + +WEEK 4 +E-Liang’s show and tell +Week 4, better known as the week of PRs. The highlight of this week was the show and tell by E-Liang and the launch of Foam by Jani. For our second show and tell this week, we had E-Liang as a volunteer. This was hands-down my favorite personal project by a person. We got to know about Gent, which is a lightweight, reusable business logic layer that makes it easy to build GraphQL servers in Node.js and TypeScript, which is heavily inspired by Ent, a Facebook Open Source project. +taneliang/gentGent is a lightweight, reusable business logic layer that makes it easy to build GraphQL servers in Node.js and…github.com +I also got to attend a lot of sessions this week such as the React-Native session by our mentor Jani, webinar on working remotely by Joe Nash, and a discussion about “Designing Your Life” by John Britton who shared his inspiring journey with the Fellows and other community members. I also had a one-on-one mentorship session with Jani which was really educational. I got several life lessons and pointers on how to be a better developer and get the most out of my learning experience. +By far the most impressive thing this week was the release of Foam, a personal knowledge management and sharing system, by Jani. The project blew up in a matter of days and now has more than 4.4k stars on GitHub!! That’s a big number. Check it out: +foambubble/foam👋 Hello friend! Looks like you're reading this page on GitHub. Please go to the 👉 rendered Foam Workspace for an…github.com +So far the journey has been amazing, unlike any other program that I’ve been a part of. It’s a perfect balance between education and contributions + having fun while doing so. +The end of the week was followed by a delightful session of pictionary with the MLH fellows. 🖼 + +CONTACT: +Twitter: https://twitter.com/kush_kunal +Thanks for reading!Gonna clap this one out like we do in standups 👏 \ No newline at end of file diff --git a/projects/Scraping Medium Articles/scraped_articles/One_stop_guide_to_Google_Summer_of_Code.txt b/projects/Scraping Medium Articles/scraped_articles/One_stop_guide_to_Google_Summer_of_Code.txt new file mode 100644 index 000000000..5f08c3e63 --- /dev/null +++ b/projects/Scraping Medium Articles/scraped_articles/One_stop_guide_to_Google_Summer_of_Code.txt @@ -0,0 +1,65 @@ +url: https://medium.com/coding-blocks/one-stop-guide-to-google-summer-of-code-a9e803beeda7 + +Title: ONE STOP GUIDE TO GOOGLE SUMMER OF CODE +by Harshit Dwivedi + +INTRODUCTION + +One stop guide to Google Summer of CodeHarshit DwivediFollowJul 18, 2018 · 9 min read +Getting bombarded with tons of messages and requests on the same topic over and over again, I was about to post a “If I had a penny for …” joke on my social media handles. +But instead, why not write a blog instead containing all the ifs, buts, whys and hows on Google Summer of Code. +So if you are a student who is wondering about getting into Google Summer of Code or someone who has been pestered with questions regarding GSoC, hang on tight, while this is going to be a long one, I can assure you that this is going to helpful. +Let’s first get the basics out of the way : + +WHAT IS GOOGLE SUMMER OF CODE? +Simply put, it’s a 16 week long program by Google aimed at promoting Open Source Software development among college and university students. +You work with one of the many Open Source Organizations on a language/framework of your own choice. +In return you get : +1. An excellent experience of working on a real world project +2. A chance to get mentored by some of the best software developers from tech giants like Facebook and Google +3. The Google Summer of Code Tag, that will benefit you immensely with all your job hunts and not to mention a Golden referral to apply for any role at Google! +4. Of course the money and the bragging rights! 😎 💰 + +WHAT GOOGLE SUMMER OF CODE IS NOT? +An Internship! +Google Summer of Code isn’t an internship and it definitely isn’t you interning at Google. +It’s merely Google providing you and the Open Source Organizations a platform to work together.A direct entry into Google +While it’s true that you get a referral to apply at any opening in Google, GSoC does not give you a direct pass into Google. +You still have to go through all the interview rounds, it just gives you an extra edge over the competition.A trend that you **have to** be a part of +Please, don’t treat GSoC as an IIT JEE entrance exam that you have to crack in order to be successful. +I’ve seen folks achieve wonderful things without even doing GSoC and vice versa, so preparing for and applying into GSoC just because every other Tom, Dick and Harry is doing so it ridiculous. +Students, especially Indian students need to understand the essence and the deeper meaning behind the program and only go for it if it’s something you truly resonate with and are willing to continue long after the program ends and you are not getting paid for it. + +WHEN DOES GOOGLE SUMMER OF CODE HAPPEN? +The application process officially starts sometime around March, but the selected organizations are announced sometime in February first week. +So students can start looking into the selected organizations and shortlist projects which interest them. + +HOW DO I GET INTO GOOGLE SUMMER OF CODE? +Getting into GSoC is not a one step process; rather it’s a multi-step process ranging from February - April and you need to perfect each and every step to maximize your chances. +I’ll outline the major steps below : +1. Start early! +Since GSoC isn’t a one step process as mentioned beforehand, you need to get started as early as possible which means shortlist the organization(s) and project(s) which interest you and start contributing to them as soon as they are announced by google. +However, some students also start way early on in November/December. Instead of waiting for new organizations to be announced, they shortlist few organizations which have been selected continuously for the past few years. +While this is risky, if done properly and carefully, it does give you an edge over others, since the number of contributions and interactions you’ve had with the organization factors in a lot while applying for GSoC under that organization. +P.S. While you have to work with one, to maximize your chances, you can apply for 3 organizations/projects, so select them carefully. +2. Contribute +This is probably the most important phase of GSoC. +Once you’ve shortlisted, you have to focus on contributing as much as possible to the organization(s) you’ve selected. +Pro Tip : Don’t select more than 3 organizations, it’ll only diminish your chances since you won’t be able to focus properly on any one of those. +What does a contribution mean? +Anything from fixing/reporting an issue in the project or implementing a new feature to writing documentation for setting up and using the project counts as a contribution. +Granted each of them has a different weight attached with them, for example fixing an issue/adding a new feature is generally contains more weightage than reporting an issue or writing documentation. +But as someone newly exploring to a project, starting off with filing issues and writing documentation is a good idea. +3. A good proposal will help you hit it home +A proposal is a document which you submit to the organization(s) you’ve selected in the above step which outlines a detailed breakdown on how you plan on enhancing/building the project in the 16 week coding period of Google Summer of Code. +Your proposal is going to be the secret key towards ensuring your selection so ensure that you are putting in extra efforts towards making it as detailed and informative as proposal. +P.S. Please do not float the same proposal across multiple projects/organizations, each project should have a separate proposal of its own. +I won’t be outlining the best practices on writing a good proposal as I believe the following blog does a fantastic job at it, so I encourage you to go through it before starting off with your proposal. +Also, here’s my proposal, in case you want to refer to it and get a general sense of how a proposal should be made. ;) +https://drive.google.com/file/d/0B6OtIpAL6oa6U3JURDA2cjVZVlZ5UUVqcXRBTGlrY0hmUkVV/view?usp=sharing +4. Repeat +After you’re done with submitting your proposal, don’t sit idle. +You get a window of 1 month from the day when you submit the proposal to the day when the selected students are announced. +Make the best of this opportunity to contribute even more to maximize the chances. +P.S. Interacting with the organization members publicly and giving them your feedback on upcoming features and releases is also a potential contribution that can be done. +You can generally find the contact link for an Organization at it’s page in the GSoC website. \ No newline at end of file diff --git a/projects/Scraping Medium Articles/scraped_articles/The_Pros_and_Cons_of_Open_Source_Software.txt b/projects/Scraping Medium Articles/scraped_articles/The_Pros_and_Cons_of_Open_Source_Software.txt new file mode 100644 index 000000000..ea7dacab6 --- /dev/null +++ b/projects/Scraping Medium Articles/scraped_articles/The_Pros_and_Cons_of_Open_Source_Software.txt @@ -0,0 +1,42 @@ +url: https://medium.com/4thought-studios/the-pros-and-cons-of-open-source-software-d498304f2a95 + +Title: THE PROS AND CONS OF OPEN SOURCE SOFTWARE +by Khalil Khalaf + +INTRODUCTION + +The Pros and Cons of Open Source Software +Is open source software right for your business? +Khalil KhalafFollowJul 11, 2017 · 6 min read + +The term “open source” refers to products designed to be publicly accessible for people to use, modify and share. Open source software is software that anyone can access, inspect and enhance the source code that most users don’t ever see in normal circumstances. A source code is a list of text commands that is written by computer programmers, to be compiled or assembled into an executable computer program. +You might have heard of open source software and may have been encouraged to give it a try. After all, why pay for Autocad when you can use Qcad to create blueprints for your building, computer chips and car parts? Why pay for Photoshop when you can use Gimp to edit and enhance your images? Why pay for Microsoft Office while you can use LibreOffice to write, calculate and do excellent presentations? +There are many legitimate advantages to using open source software. However, there are downsides to using them, especially from the standpoint of day to day business life and development. Before committing to open source software, you should consider the following advantages and disadvantages. + +ADVANTAGES: + +Free and/or Cheaper than Commercial Products. +Open source software comes with a great advantage since it can be installed for free. Furthermore, it can be used and deployed again and again on multiple machines without the need of tracking the license compliance and terms of use. For example, according to Kate Rockwood, “…Instead of sinking 375 days — and $500,000 — into developing a proprietary code, Pendo went in the completely opposite direction: It down­loaded an open-source software engine coded entirely by volunteer[s].” +Open source software help companies save the time and money by providing ready to use software as a whole. This software could be plugins (features to be added to existing software), Front ends and interfaces that are easy to integrate, or Back ends and easy to use engines. This might sound unbelievable, but open source programs are developed with the intention to be available to anyone, even those who can’t afford commercial software. Furthermore, many of these programs are created to work with almost any type of platform, which helps extend your hardware life and avoids the need to constantly replace them. +In the Software Development Life Cycle, there are three stages that are often underestimated by project managers: Testing, Debugging and Integration. If you are a software development company, you likely know now — after disappointing your clients — that these three stages consume almost the same time as time dedicated to other stages of the software project. Open source software is good at cutting down on the development and reduces the pain and time of development planning and stages. +Highly Reliable. +Open source software is usually developed by a group of talented and skillful experts. Sometimes, they are developed by tens or hundreds of volunteers that simply love what they do for the community. Hence why most of the open source software are high-quality programs. Also, since anyone can access the code and fix a bug, you will notice continuous improvement and new versions or features added to the software every now and then. This improvement and the code itself will always exist even if it was originally developed by a current dissolved company. +Also, you should know that any open source software can be customized and tweaked by you, which can help your company match the software with your business’s needs. You literally can do whatever you want with it and you aren’t locked into packages that are only compatible with each other. This can be especially helpful if you are a software development company. For example, if a client asked for a software with 10 features, you can download an open source with 5 features already done, and add in the missing 5 features. Or maybe an open source program with 10 features that do not match your client’s requirement but then you modify them for the perfect match. Or even an open source software with 15 features and simply remove or hide the additional 5 features. The point is, with this level of customization you can guarantee that the software could be reliable since it can be tweaked specifically by you. + +DISADVANTAGES: + +Not as User-Friendly as Commercial Software + +This cannot be generalized for all open source software. For example LibreOffice, Mozilla Firefox and Android OS are amazingly easy to use. However, while there are several open source software that solve large problems super fast, complicated computation or big data, but sometimes not much attention is given to its GUI (Graphical User Interface). This can make the software annoying to work with especially for nontechnical users. Nontechnical companies may need to dedicate some time to train their team and get them up to speed for every new release of these open source programs. As for technical companies, especially software development companies, they may need to build a proper GUI and integrate it with the back end which may require as much time and money as rewriting the whole software. +Lack of extensive tech support +User communities are out there and can be very responsive, but you really can’t count on the community one hundred percent of the time since it is not their job. No one is getting paid for fixing your bugs, provide you or your team the proper training, or respond to your questions and requirements. If your client or employee is suffering from a bug, you are literally on your own. The best thing to do might be to just wait for somebody in the community to face the same issue and hopefully fix it. The other option would be to hire an expert dedicated to maintaining and improving the software. +Most of the times, you will also need to get your team up to speed. This is because of the constant development and in parallel between several community developers of open source software. Due to this, there is often confusion among the team since they are uncertain which version does what and if its compatible with other software and platforms. Hence where additional cost comes with every open source software. + +FINAL THOUGHTS: +As a software developer myself, I provided the community with all the software that I wrote for personal projects. I have used open source software for personal needs and within software development jobs. I recommend using open source software since it saved me and my employer a lot of time and money, and made my clients happier. Also, looking at other developer’s codes and algorithms improved my skills in reusing codes that were written by someone else, and my experiences when reviewing others’ algorithms and logic. +As a result of all the benefits, I always contribute in online communities helping other developers and users. I consider doing that as giving back to the community. I would not have been a developer without the help of other developers who did to me in the past exactly what I am doing to others now. There is absolutely no passion for me to abandon a project and never reply to users, and I can say I never had problems in using an open source software; specifically to mention ones that got abounded. We developers love improving and fixing things. We simply love what we do. + + +FURTHER READING: +Using Open-Source Code Can Save You Half a Million Dollars — but Do It CarefullyNine thousand hours. That’s how much time financial tech firm Pendo Systems estimates it would take to write the code…www.inc.com + diff --git a/projects/Scraping Medium Articles/scraping_medium.py b/projects/Scraping Medium Articles/scraping_medium.py new file mode 100644 index 000000000..4680dca49 --- /dev/null +++ b/projects/Scraping Medium Articles/scraping_medium.py @@ -0,0 +1,71 @@ +import os +import sys +import requests +import re +from bs4 import BeautifulSoup + +# switching to current running python files directory +os.chdir('\\'.join(__file__.split('/')[:-1])) + +# function to get the html of the page +def get_page(): + global url + url = input('Enter url of a medium article: ') + # handling possible error + if not re.match(r'https?://medium.com/',url): + print('Please enter a valid website, or make sure it is a medium article') + sys.exit(1) + res = requests.get(url) + res.raise_for_status() + soup = BeautifulSoup(res.text, 'html.parser') + return soup + +# function to remove all the html tags and replace some with specific strings +def purify(text): + rep = {"
": "\n", "
": "\n", "
  • ": "\n"} + rep = dict((re.escape(k), v) for k, v in rep.items()) + pattern = re.compile("|".join(rep.keys())) + text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text) + text = re.sub('\<(.*?)\>', '', text) + return text + +# function to compile all of the scraped text in one string +def collect_text(soup): + fin = f'url: {url}\n\n' + main = (soup.head.title.text).split('|') + global title + title = main[0].strip() + fin += f'Title: {title.upper()}\n{main[1].strip()}' + + header = soup.find_all('h1') + j = 1 + + try: + fin += '\n\nINTRODUCTION\n' + for elem in list(header[j].previous_siblings)[::-1]: + fin += f'\n{purify(str(elem))}' + except: + pass + + fin += f'\n\n{header[j].text.upper()}' + for elem in header[j].next_siblings: + if elem.name == 'h1': + j+=1 + fin += f'\n\n{header[j].text.upper()}' + continue + fin += f'\n{purify(str(elem))}' + return fin + +# function to save file in the current directory +def save_file(fin): + if not os.path.exists('./scraped_articles'): + os.mkdir('./scraped_articles') + fname = './scraped_articles/' + '_'.join(title.split()) + '.txt' + with open(fname, 'w', encoding='utf8') as outfile: + outfile.write(fin) + print(f'File saved in directory {fname}') + +# driver code +if __name__ == '__main__': + fin = collect_text(get_page()) + save_file(fin)