Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent npm install for JavaScript functions #6609

Open
rudfoss opened this issue Sep 7, 2020 · 8 comments
Open

Prevent npm install for JavaScript functions #6609

rudfoss opened this issue Sep 7, 2020 · 8 comments

Comments

@rudfoss
Copy link

rudfoss commented Sep 7, 2020

We are attempting to use puppeteer in an Azure Function to produce PDFs using Chromium. We have a mono-repo setup that uses yarn to manage dependencies and we need to install puppeteer along with a package.json file in production in order for puppeteer to pick up the Chromium install.

The problem is that according to this documentation Azure Functions will run npm install automatically when it detects a package.json file, but because we are using yarn to manage dependencies this install will not respect the yarn.lock file version hashes.

We are easily able to do the install deterministically during our build process, but we need to have the package.json file alongside the project in order for puppeteer to find the chromium install. This in turn triggers an npm install during deploy which will install our dependency, but not in a deterministic way as it does not know about yarn.lock.

Is there a way to prevent Azure Function deployments from triggering npm install even when there is a package.json present?

@anthonychu
Copy link
Member

@rudfoss Can you share more details about the build/deploy process you'd like to use? The docs you pointed to probably require a bit of updating as it appears to be specific to deploying to Windows. For now, we can help you figure out what works best for your situation.

The first thing to keep in mind is that headless Chromium currently only works in our Linux environments (this was recently added, more info in this blog post here). There are multiple ways to deploy your app, which is why understanding how you want to deploy is important. I've put together a sample app that uses Yarn and will walk you through how to deploy it. Hopefully that'll work with your CI/CD process. If not, we can figure something out.

Here's the sample app. I've added a super old version of moment.js (2.0.0) to it and it's in the yarn lockfile.

The simplest way to deploy this app is with the Azure Functions Core Tools using remote build. This runs a build in the cloud. In Linux (perhaps Windows as well now), remote build uses Oryx to build apps and it's able to detect and use Yarn for Node projects.

To deploy the sample to a Node.js 12 Linux Consumption app, run this command:

func azure functionapp publish $APP_NAME -b remote

You should see in the output that it noticed yarn.lock and is installing dependencies using Yarn instead of npm.

Using Node version:
v12.16.1

Using Yarn version:
1.17.3

Configuring Yarn cache folder...
yarn config v1.17.3
warning package.json: No license field
success Set "cache-folder" to "/usr/local/share/yarn-cache".
Done in 0.04s.

Running 'yarn install --prefer-offline'...

yarn install v1.17.3
warning package.json: No license field
warning 20200907-puppeteer-yarn@1.0.0: No license field
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
Done in 40.60s.

One way to browse the files after they've been deployed is using the Azure Functions VS Code extension. You can see that yarn correctly used the lockfile to install moment@2.0.0 instead of the latest that satisfies ^2.0.0.

image

You should be able to use Core Tools to deploy from your CI/CD pipeline. If you use specific tasks for deploying Azure Functions (like in GitHub Actions or an Azure DevOps), I'm not sure if it's possible to run the build remotely. One potential problem if you're not using remote build is that the execute bit on the Chromium binary might not be set properly. We can help you if you run into this with your deployment method.

/cc @ggailey777 on possible doc updates

@anthonychu anthonychu added this to the Triaged milestone Sep 7, 2020
@rudfoss
Copy link
Author

rudfoss commented Sep 8, 2020

Hi @anthonychu! Thanks for getting back to me so quickly. I've read your answer and unfortunately I don't think we can easily rewrite our pipeline to support this. I'll explain our pipeline in more detail:

Code structure

Our code is structured in a monorepo using yarn workspaces. In essence this means our project folders look like this:

packages/azure-functions
packages/front-end
packages/utilities
...

Each package is basically a separate node project with its own package.json containing the specific dependencies for that pacakge. However yarn workspaces manages packages centrally with one single yarn.lock file containing exact version checksums for each package. This structure allows us to share dependencies between projects as well as easily import modules from one package into another. E.g.:

// in our azure function:
import { findObject } from "utilities/findObject"

This is important because it greatly reduces the need for duplicating functionality and allows us to use strongly typed objects that share an interface between projects.

In addition to this our azure-functions project is internally structured like this:

functions/function-a
functions/function-b
functions/function-c

Each function folder contains a separate azure function with it's own function.json:

functions/function-a/function.json
functions/function-a/src/index.ts

Build

We've set up a build script that will build each function using webpack and output a single index.js file for each of them. The reason for using webpack is to support the cross-project imports described above in a simple manner without having to deploy every dependent project with the functions. It also gives us a clean output bundle that we currently zip and deploy through Azure DevOps using the task AzureRmWebAppDeployment@4 in a DevOps yaml pipeline.

The builld produces a dist folder that basically matches the folder structure Azure Functions expects on the server. It looks like this:

dist/host.json
dist/function-a/index.js
dist/function-a/function.json
dist/function-b/index.js
dist/function-b/function.json

The upside of this is that there is no need for a package.json file or installing dependencies in the Azure function host. Everything is there and ready for use within the index.js file. We already have three functions running like this and it works great.

The problem with puppeteer

Now we want to build a new function that follows the regimen above. It's only task is to receive a url through a service bus topic message, print a PDF of that url using puppeteer and dump the result to a Storage Account. We're already running our function app on linux so according to the blog you linked this should not be a problem and indeed it seems to work fine when we deploy the function completely outside of our build pipeline.

The problem is that puppeteer expects to find a package.json in the current working directory process.cwd() (or up) that will help it find the local node_modules folder and further the local Chromium install. We've updated our build to dump a standalone package.json file containing only a reference to puppeteer and altered our webpack build to exclude the puppeteer dependency from our bundle process:

dist/host.json
dist/package.json <- Contains only the puppeteer dependency
dist/yarn.lock <- Our monorepo yarn.lock file to force exact versions of puppeteer and derivative dependencies
dist/function-a/index.js
dist/function-a/function.json
dist/function-b/index.js
dist/function-b/function.json
dist/function-pdf/index.js <- The PDF generator function
dist/function-pdf/function.json

With this structure it is trivial for us to run yarn install --frozen-lockfile in our DevOps build pipeline to produce a node_modules folder containing only puppeteer, its dependencies and Chromium just like puppeteer expects. We then zip up everything and deploy it using the DevOps pipeline task AzureRmWebAppDeployment@4

The problem with Azure Functions

Now, because Azure Function deployments look for package.json it will try to run npm install for our deployment. However it will not find a package-lock.json file because we have used yarn so it will not respect our exact package versions and may result in a non-deterministic build. If we were able to ensure that npm install would not run we could guarantee this ourselves because we simply run yarn install --frozen-lockfile as part of our build pipeline and deploy everything to the Azure Function as one big package.

Our solution

After writing this issue yesterday we actually seem to have found a solution for the problem. By moving our package.json file into the function subfolder we are able to skip the automated npm install and still maintain the puppeteer requirement of having a node_modules folder with the local Chromium install. Our dist folder now looks like this:

dist/host.json
dist/function-a/index.js
dist/function-a/function.json
dist/function-b/index.js
dist/function-b/function.json
dist/function-pdf/index.js <- The PDF generator function
dist/function-pdf/function.json
dist/function-pdf/package.json <- Contains only the puppeteer dependency
dist/function-pdf/yarn.lock <- Our monorepo yarn.lock file to force exact versions of puppeteer and derivative dependencies

We then run yarn install --frozen-lockfile within the function-pdf folder resulting in a node_modules folder with puppeteer and Chromium next to the function itself. The end result is a folder structure like this:

dist/host.json
dist/function-a/index.js
dist/function-a/function.json
dist/function-b/index.js
dist/function-b/function.json
dist/function-pdf/index.js <- The PDF generator function
dist/function-pdf/function.json
dist/function-pdf/package.json <- Contains only the puppeteer dependency
dist/function-pdf/yarn.lock <- Our monorepo yarn.lock file to force exact versions of 
dist/function-pdf/node_modules/... <- Contains puppeteer and its dependencies
puppeteer and derivative dependencies

This is what we zip and deploy.

Now since the current working directory of an Azure Function is the root folder (wwwroot) we have to start our function with the line:

process.chdir(__dirname)

This alters the cwd for all following commands ensuring puppeteer will look for Chromium next to the index.js file where the node_modules folder is found. Then we can simply run:

const browser = await puppeteer.launch()

to start the browser and do our thing.

This became a really long post, but hopefully it clears up why we have this issue. As I wrote we have already (seemingly) found a solution to the issue, but I'm open to suggestions for anything we can improve.

@anthonychu
Copy link
Member

Glad you got it to work. I think I understand what you're doing and why it works. That's a pretty non-standard setup so, as you've discovered, you're more or less on your own when it comes to figuring out how to deploy it.

I'm not super familiar with the Azure DevOps task, but you should be able to build everything as you like it in your pipeline, then zip it up and deploy it without running a remote build.

@balag0 is there a way deploy from Azure DevOps without running a remote build?

@rudfoss
Copy link
Author

rudfoss commented Sep 8, 2020

I kind of agree and then kind of not ;)

The thing is we don't want to use the Visual Studio Code deployment tools as we want our builds to run through our build pipeline with does testing and validation as well. Having individual devs deploying code directly to Azure Functions in production is also not a feasible scenario. So from that alone we must drop the direct deployment extensions.

Our build pipeline produces a set of files that we can basically copy into the Azure Function app. It probably does some trickery behind the scenes to hook everything up, but apart from that it should be nothing more than a file copy with JavaScript files. From what I can tell this also does not seem that far off from a normal deployment where we have an artifact (our functions) produced from a build that we then deploy using the deployment task.

As far as I can see the biggest non-standard thing we are doing is not wanting to run npm install during deployment. How we produce the actual JavaScript files should not matter to the Azure Function IMHO.

@anthonychu
Copy link
Member

100% agree that you want your pipeline to deploy the app and I'm not suggesting you do anything different. What I am hoping for is that you can configure the pipeline to build the app as you'd expect it to run in Azure (with node_modules at the app root and containing all necessary dependencies). Then the pipeline can take that artifact and simply deploy it without triggering a remote build. This should be possible.

What you have is pretty close, except the process.chdir(__dirname) part is a bit hacky and we should be able to get rid of it.

@rudfoss
Copy link
Author

rudfoss commented Sep 8, 2020

Great! That is exactly what I was hoping we could achieve.

I totally agree that the process.chdir line is a hack so getting rid of it would be great. Ideally I'd like to bundle puppeteer with webpack as well and just have Chromium install as a standalone thing, but I get why that could be more trouble than it's worth.

We'd still like to bundle with webpack as we do as our build process has a really simple escape hatch for configuring external dependencies such as puppeteer (we just name them in a json file and our build will produce a subset of package.json based on it).

I was thinking that having some indicator to tell the deployment that we have already done the npm install step ourselves would be a flexible way to resolve this. It would support any build pipeline where you'd want control over the install process for any reason.

@v-anvari
Copy link

@anthonychu / @balag0, any thoughts on this request?

@lensbart
Copy link

I’m running into the same issue, albeit for different reasons. I think at the simplest, a configuration option to choose a version of npm or Yarn would suffice. But zipping an app that has been built using yarn, and running it via npm is not a solid solution:

  • Yarn 2 Plug 'n' Play offers a new module resolution mechanism which is not compatible with npm
  • Yarn workspaces install some modules used by submodules in the project root, such that they cannot be found when the submodule is run via npm

I've run into both of these issues when trying to deploy a Yarn project via an Azure DevOps pipeline. Is there any way azure-pipelines.yml could be configured so that the app is deployed in an environment that supports Yarn?

I can provide more information about our setup if wanted.

Thanks in advance!

@fabiocav fabiocav removed this from the Triaged milestone Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants